You should be a bit more specific about what your task is. If it is to evaluate the gradient at one given point once using the oracle that tells you the value of the function, N calls are inavoidable if you want any decent precision. Indeed, assuming that you already know the value at the point and that the function is a random linear function with gradient of size $sqrt N$, the first call will result in knowing the component of the gradient in just one direction, which, with high probability, is about $1$ in absolute value. After that, you are left with the task of finding the gradient of a linear function in $mathbb R^{N-1}$ of size $sqrt{N-1}$.
The gradient approximation you cited is no good for finding the actual gradient but good enough for finding a vector that has scalar product $N^{-1/2}$ with the gradient (assuming that the gradient has length 1). Thus, the unit step in that direction decreases the function by about $N^{-1/2}$ as opposed to the decrease by $1$ if the true gradient were used. Still, after $sqrt N$ steps with this fake gradient you go down as much as with one step with the true gradient whose evaluation would take $N$ function calls. This gives you a gain of $sqrt N$ in time if the computation of the function is the most expensive part of the process, so it may make sense, provided that the second differential is not too large, etc.
No comments:
Post a Comment