Let’s compare to how NumPy works.
>>> import numpy as np
>>> x = np.array(1.0)
>>> y = x + 1.0
At this point there are two arrays in memory, x
and y
. The y
has the value 2.0, but there’s no record of how it came to have that value. The addition has left no record of itself.
TensorFlow is different.
>>> x = tf.Variable(1.0)
>>> y = x + 1.0
Now only x
is a TensorFlow variable; y
is an add
op, which can return the result of that addition if we ever run it.
One more comparison.
>>> x = np.array(1.0)
>>> y = x + 1.0
>>> y = x + 1.0
Here y
is assigned to refer to one result array x + 1.0
, and then reassigned to point to a different one. The first one will be garbage collected and disappear.
>>> x = tf.Variable(1.0)
>>> y = x + 1.0
>>> y = x + 1.0
In this case, y
refers to one add
op in the TensorFlow graph, and then y
is reassigned to point to a different add
op in the graph. Since y
only points to the second add
now, we don’t have a convenient way to work with the first one. But both the add
ops are still around, in the graph, and will stay there.
(As an aside, Python’s mechanism for defining class-specific addition and so on, which is how +
is made to create TensorFlow ops, is pretty neat.)
Especially if you’re just working with the default graph and running interactively in a regular REPL or a notebook, you can end up with a lot of abandoned ops in your graph. Every time you re-run a notebook cell that defines any graph ops, you aren’t just redefining ops — you’re creating new ones.
Often it’s okay to have a few extra ops floating around when you’re experimenting. But things can get out of hand.
for _ in range(1e6):
x = x + 1
If x
is a NumPy array, or just a regular Python number, this will run in constant memory and finish with one value for x.
But if x
is a TensorFlow variable, there will be over a million ops in your TensorFlow graph, just defining a computation and not even doing it.
One immediate fix for TensorFlow is to use a tf.assign
op, which gives behavior more like what you might expect.
est
increment_x = tf.assign(x, x + 1)
for _ in range(1e6):
session.run(increment_x)
This revised version does not create any ops inside the loop, which is generally good advice. TensorFlow does have control flow constructs including while loops. But only use these when really needed.
Be conscious of when you’re creating ops, and only create the ones you need. Try to keep op creation distinct from op execution. And after interactive experimentation, eventually get to a state, probably in a script, where you’re only creating the ops that you need.
Avoid constants in the graph
A particularly unfortunate op to needlessly add to a graph is accidental constant ops, especially large ones.
>>> many_ones = np.ones((1000, 1000))
There are a million ones in the NumPy array many_ones
. We can add them up.
>>> many_ones.sum()
## 1000000.0
What if we add them up with TensorFlow?
>>> session.run(tf.reduce_sum(many_ones))
## 1000000.0
The result is the same, but the mechanism is quite different. This not only added some ops to the graph — it put a copy of the entire million-element array into the graph as a constant.
Variations on this pattern can result in accidentally loading an entire data set into the graph as constants. A program might still run, for small data sets. Or your system might fail.
One simple way to avoid storing data in the graph is to use the feed_dict
mechanism.
>>> many_things = tf.placeholder(tf.float64)
>>> adder = tf.reduce_sum(many_things)
>>> session.run(adder, feed_dict={many_things: many_ones})
## 1000000.0
As before, be clear about what you’re adding to the graph and when. Concrete data usually only enters the graph at moments of evaluation.
src:https://www.kdnuggets.com/2017/05/how-not-program-tensorflow-graph.html