ABSTRACT
We study the approximation of two-layer compositions f(x)=g(Ï(x)) via deep networks with ReLU activation, where Ï is a geometrically intuitive, dimensionality reducing feature map. We focus on two intuitive and practically relevant choices for Ï: the projection onto a low-dimensional embedded submanifold and a distance to a collection of low-dimensional sets. We achieve near optimal approximation rates, which depend only on the complexity of the dimensionality reducing map Ï rather than the ambient dimension. Since Ï encapsulates all nonlinear features that are material to the function f, this suggests that deep nets are faithful to an intrinsic dimension governed by f rather than the complexity of the domain of f. In particular, the prevalent assumption of approximating functions on low-dimensional manifolds can be significantly relaxed using functions of type f(x)=g(Ï(x)) with Ï representing an orthogonal projection onto the same manifold.