On one hand, you can think of approximation as to functions as interpolation is to data; however, they also differ in how they work. Approximation is often an error minimization process, where you start with a family of functions and try to find the one that minimizes some distance metric (such as least square error); the goal with interpolation, on the other hand, is generally to construct a function passing through each of a collection of points. If you have an underlying model that specifies a certain functional form, then approximation is the goal. This is especially the case if your data comes from a process that is associated with noise. The distinction between this process and interpolation can often vanish if you see your data as pointwise evaluates of a function without error, however.
Suppose we have a general function $f$, which may or may not be a member of function space, $\Phi$. Then the goal is to find some funtion $\hat{\varphi} \in \Phi$ such that $\lVert f - \hat{\varphi} \rVert \leq \lVert f - \varphi \rVert$ for all $\varphi \in \Phi$. That is, given all the functions in the space, we want the one that minimizes the distance from $f$ among functions in the space. Note that we have left the norm unspecified - decision of an appropriate norm is often part of the problem in approximation. A different choice of norm could lead to a different solution to the approximation problem. In this framework, we may often call $\hat{\varphi}$ an approximant of $f$.
In practice, we often have data points that are the evaluation of $f$, which we may call the data generating process. If, for example, $f$ is a real-valued function of time, as a time series, we may have data points $y_i = f(t_i)$ for $i = 1,\dots,n$. Or if we consider say temperature measures at a fixed time on a uniform (idealized, 1-dimensional) rod as in the heat equation, with $y_i = f(x_i)$ for points $i = 1,\dots,n$. This can be generalized to the full evolution of the heat equation over time as $y_{i,j} = f(x_i,t_j)$ for points $i = 1,\dots,n$ and times $j = 1,\dots,m$, but the most straightforward procedures we'll see for a while will be real-valued functions in a single variable.
The approximants often come from a certain well-defined class of functions that generally are chosen for either nice computational properties or because they correspond to some known model. For examples of the former, polynomials and trigonometric functions are often computationally straightfoward to work with in this setting. For an example of the latter, a linear regression model in statistics gives a linear (or, more properly, affine) line of best fit assuming linear response in the dependent variable, assuming (typically) normally distributed error.
Background: Mathematics 3D Blender Illustration
Credit: MasterTux
Source: creazilla
License: CC0-1.0
Modifications: Rotated and darkened