Fourier theory is pretty complicated mathematically. But there are some beautifully simple holistic concepts behind Fourier theory which are relatively easy to explain intuitively. There are other sites on the web that can give you the mathematical formulation of the Fourier transform. I will present only the basic intuitive insights here, as applied to spatial imagery.
Basic Principles: How space is represented by frequency
Higher Harmonics: "Ringing" effects
An Analog Analogy: The Optical Fourier Transform
Fourier Filtering: Image Processing using Fourier Transforms
Fourier theory states that any signal, in our case visual images, can be expressed as a sum of a series of sinusoids. In the case of imagery, these are sinusoidal variations in brightness across the image. For example the sinusoidal pattern shown below can be captured in a single Fouier term that encodes 1: the spatial frequency, 2: the magnitude (positive or negative), and 3: the phase.
These three values capture all of the information in the sinusoidal image. The spatial frequency is the frequency across space (the xaxis in this case) with which the brightness modulates. For example the image below shows another sinusoid with a higher spatial frequency.
The magnitude of the sinusoid corresponds to its contrast, or the difference between the darkest and brightest peaks of the image. A negative magnitude represents a contrastreversal, i.e. the brights become dark, and viceversa. The phase represents how the wave is shifted relative to the origin, in this case it represents how much the sinusoid is shifted left or right.
A Fourier transform encodes not just a single sinusoid, but a whole series of sinusoids through a range of spatial frequencies from zero (i.e. no modulation, i.e. the average brightness of the whole image) all the way up to the "nyquist frequency", i.e. the highest spatial frequency that can be encoded in the digital image, which is related to the resolution, or size of the pixels. The Fourier transform encodes all of the spatial frequencies present in an image simultaneously as follows. A signal containing only a single spatial frequency of frequency f is plotted as a single peak at point f along the spatial frequency axis, the height of that peak corresponding to the amplitude, or contrast of that sinusoidal signal.
There is also a "DC term" corresponding to zero frequency, that represents the average brightness across the whole image. A zero DC term would mean an image with average brightness of zero, which would mean the sinusoid alternated between positive and negative values in the brightness image. But since there is no such thing as a negative brightness, all real images have a positive DC term, as shown here too.
Actually, for mathematical reasons beyond the scope of this tutorial, the Fourier transform also plots a mirrorimage of the spatial frequency plot reflected across the origin, with spatial frequency increasing in both directions from the origin. For mathematical reasons beyond the scope of this explanation, these two plots are always mirrorimage reflections of each other, with identical peaks at f and f as shown below.
What I have shown is actually the Fourier transform of a single scan line of the sinusoidal image, which is a onedimensional signal. A full twodimensional Fourier transform performs a 1D transform on every scanline or row of the image, and another 1D transform on every column of the image, producing a 2D Fourier transform of the same size as the original image.
The image below shows a sinusoidal brightness image, and its twodimensional Fourier transform, presented here also as a brightness image. Every pixel of the Fourier image is a spatial frequency value, the magnitude of that value is encoded by the brightness of the pixel. In this case there is a bright pixel at the very center  this is the DC term, flanked by two bright pixels either side of the center, that encode the sinusoidal pattern. The brighter the peaks in the Fourier image, the higher the contrast in the brightness image. Since there is only one Fourier component in this simple image, all other values in the Fourier image are zero, depicted as black.


Here is another sinusoidal brightness image, this time with a lower spatial frequency, together with it's twodimensional Fourier transform showing three peaks as before, except this time the peaks representing the sinusoid are closer to the central DC term, indicating a lower spatial frequency.


The significant point is that the Fourier image encodes exactly the same information as the brightness image, except expressed in terms of amplitude as a function of spatial frequency, rather than brightness as a function of spatial displacement. An inverse Fourier transform of the Fourier image produces an exact pixelforpixel replica of the original brightness image.
The orientation of the sinusoid correlates with the orientation of the peaks in the Fourier image relative to the central DC point. In this case a tilted sinusoidal pattern creates a tilted pair of peaks in the Fourier image.


Different Fourier coefficients combine additively to produce combination patterns. For example the sinusoidal image shown below is computed as the sum of the tilted sinusoid shown above, and the vertical sinusoid of lower spatial frequency shown above that.


The brightness and the Fourier images are completely interchangable, because they contain exactly the same information. The combined brightness image shown above could have been produced by a pixelforpixel adding of the two brightness images, or by a pixelforpixel addition of the corresponding Fourier transforms, followed by an inverse transform to go back to the brightness domain. Either way the result would be exactly identical.
The basis set for the Fourier transform is the smooth sinusoidal function, which is optimized for expressing smooth rounded shapes. But the Fourier transform can actually represent any shape, even harsh rectilinear shapes with sharp boundaries, which are the most difficult to express in the Fourier code, because they need so many higher order terms, or higher harmonics. How these "square wave" functions are expressed as smooth sinusoids will be demonstrated by example.
The figure below shows four sinusoidal brightness images of spatial frequency 1, 3, 5, and 7. The first one, of frequency 1, is the fundamental, and the others are higher harmonics on that fundamental, because they are integer multiples of the fundamental frequency. These are in fact the "odd harmonics" on the fundamental, and each one exhibits a bright vertical band through the center of the image. The Fourier transform for each of these patterns is shown below.




The next table shows the result of progressively adding higher harmonics to the fundamental. Note how the central vertical band gets sharper and stronger with each additional higher harmonic, while the background drops down towards a uniform dark field. Note also how the higher harmonics produce peaks in the Fourier images that spread outward from the fundamental, defining a periodic pattern in frequency space.




The images below show what would happen if this process were continued all the way out to the Nyquist frequency  it would produce a thin vertical stripe in the brightness image, with sharp boundaries, i.e. a "square wave" in brightness along the x dimension. The Fourier transform of this image exhibits an "infinite" series of harmonics or higher order terms, although these do not actually go out to infinity due to the finite resolution of the original image. This is how the Fourier transform encodes sharp squarewave type features as the sum of a series of smooth sinusoids.


A great intuitive advance can be made in understanding the principles of the Fourier transform once you learn that a simple lens can perform a Fourier transform in realtime as follows. Place an image, for example a slide transparency, at the focal length of the lens, and illuminate that slide with coherent light, like a colimated laser beam. At the other focus of the lens place a frosted glass screen. Thats it! The lens will automatically perform a Fourier transform on the input image, and project it onto the frosted glass screen. For example if the input image is a sinusoidal grating, as shown below, the resultant Fourier image will have a bright spot at the center, the DC term, with two flanking peaks on either side, whose distance from the center will vary with the spatial frequency of the sinusoid.
We can now see the holistic principle behind the Fourier transform. Every point on the input image radiates an expanding cone of rays towards the lens, but since the image is at the focus of the lens, those rays will be refracted into a parallel beam that illuminates the entire image at the groundglass screen. In other words, every point of the input image is spread uniformly over the Fourier image, where constructive and destructive interference will automatically produce the proper Fourier representation.
Conversely, parallel rays from the entire input image are focused onto the single central point of the Fourier image, where it defines the central DC term by the average brightness of the input image.
Note that the optical Fourier transformer automatically operates in the reverse direction also, where it performs an inverse Fourier transform, converting the Fourier representation back into a spatial brightness image. Mathematically the forward and inverse transforms are identical except for a minus sign that reverses the direction of the computation.
I will now show how the Fourier transform can be used to perform filtering operations to adjust the spatial frequency content of an image. We begin with an input image shown below, and perform a Fourier transform on it, then we do an inverse transform to reconstruct the original image. This reconstructed image is identical, pixelforpixel, with the original brightness image.



I will now demonstrate how we can manipulate the transformed image to adjust its spatial frequency content, and then perform an inverse transform to produce the Fourier filtered image. We begin with a lowpass filter, i.e. a filter that allows the low spatialfrequency components to pass through, but cuts off the high spatial frequencies. Since the low frequency components are found near the central DC point, all we have to do is define a radius around the DC point, and zeroout every point in the Fourier image that is beyond that radius. In other words the lowpass filtered transform is identical to the central portion of the Fourier transform, with the rest of the Fourier image set to zero. An inverse Fourier transform applied to this lowpass filtered image produces the inverse transformed image shown below.


We see that the lowpass filtered image is blurred, preserving the low frequency broad smooth regions of dark and bright, but losing the sharp contours and crisp edges. Mathematically, lowpass filtering is equivalent to an optical blurring function.
Next we try the converse, highpass filtering, where we use the same spatial frequency threshold to define a radius in the Fourier image. All spatial frequency components that fall within that radius are eliminated, preserving only the higher spatial frequency components. After performing the inverse transform on this image we see the effect of highpass filtering, which is to preserve all of the sharp crisp edges from the original, but it loses the larger regions of dark and bright.


If the lowpass filtered inversetransformed image is added pixelforpixel to the highpass inversetransformed image, this would exactly restore the original unfiltered image. These images are complementary therefore, each one encodes the information which is missing from the other.
Next we will demonstrate a bandpass filtering that preserves only those spatial frequencies that fall within a band, greater than a low cutoff, but less than a higher cutoff.


The next simulation is the same as above, except with a narrower band of spatial frequencies.


The next simulation shows bandpass filtering about a higher spatialfrequency band,


and finally the same as above except again using a narrower spatialfrequency band.


These computer simulations demonstrate that the Fourier representation encodes image information in a holistic distributed manner that allows manipulation of the global information content of the image by spatial manipulations of the transformed image.