Great opening discussion.
I think the "synthesis" between the two views is found by examining the role played by randomness during training.
During training, the procedure of backpropagation (BP) is performed by an algorithm called "stochastic gradient-descent", or SGD. The name SGD might seem quite intimidating, but the concept is not. Basically, imagine the error-rate of the neural-net as a 2D landscape. Initially, when you begin training, you are at a very high peak, because the error-rate is very high. You want to find the lowest point in the landscape (or a point that is nearly the same altitude), how do you do that? Well, you find the direction where the
gradient has the steepest
descent, and you follow that. This is called
gradient descent and this is how classical BP is done. But there's a problem. Because GD is deterministic, if you had started at a very slightly different point on the mountain peak, you could have ended up at a completely different "low point" which may actually be nowhere near the lowest altitude (lowest error-rate). This is called a "local minimum", think of a bowl-shaped valley which is at the top of an old mountain; the valley is the lowest point near itself, but the whole valley is still at a high altitude. If you had just happened to go down towards that valley, instead of down the outer sides of the mountain, you would have gotten stuck in a local minimum. SGD adds stochasticity to gradient descent, which acts a little bit like having a big pogo-stick which you occasionally pull out and use to take a random leap in some direction. By doing this, you have greatly improved chances of finding a global minimum (or a minimum close to the global minimum), and significantly reduced chances of ending trapped up in a local minimum. In the case of the bowl-shaped mountain-top valley, if you got your pogo-stick out while you were part way down the descent into the valley, you could end up back on the outside slope of the mountain and escape the local-minimum.
OK, so far so good, this is just an algorithm like any other algorithm. However, let's examine very closely the source of randomness in this procedure. There is a concept in computational complexity theory called
derandomization. Basically, it is provable that any algorithm that uses randomness can be
derandomized without altering any of the provable properties of that algorithm, in particular, without altering the running time, precision, etc. What this means is that, for every random algorithm A which you give me (a "random algorithm" is any algorithm that uses randomness), I can give you back a non-random algorithm A' that satisfies all the provable properties of A. Thus, there exists a non-randomized version of SGD that performs just as well as SGD does.
Now, let's move to the domain of password cracking. In cracking, we would love to fool a user into believing he has used a truly random seed when, in fact, it is a pseudo-random (non-random) seed which we ourselves control. In this way, we have fooled the user into choosing a password/key that was random when, in fact, it was non-random and we can quickly reconstruct the key without having to do a brute-force search. You can think of this kind of password-generation attack as an
instance of derandomization. Password-generation is a random algorithm that just consists of saying, "Give me X random bits". Derandomizing that algorithm allows me to satisfy whatever constraints you have on your password-generator ("use lower-case, upper-case, numerals, special-characters", etc.) but in a way where the result that is generated is actually
not random.
Now, let's say I'm a villain with access to some kind of cosmic-scale computing resource. I want your AI-training to complete in such a way that you believe you have trained it using true randomness in the SGD algorithm. However, I also want to satisfy an additional constraint that whenever the neural net (NN) hears the word "frobnicate", it goes into some kind of kill-mode. By virtue of derandomization, we know that this is possible and, in addition, you will never realize that this has been done, even as you watched the NN be trained, step by step. The way I will do this is by interposing my random-number generator into your SGD algorithm so that, when your SGD algorithm asks "give me a random number!" I will generate a number that is going to guide your SGD algorithm to a particular path that I want it to go to. That is, when you get on your pogo-stick, the place you land up is actually
not random. That I can do this subtle manipulation of your training procedure is possible due to derandomization! You give me SGD (which is a random algorithm), and I give you back some SGD' which seems to you indistinguishable from SGD but which, in actual fact, I have control over, at least to some degree.
Back to the original discussion in the video, this shows that "rolling the dice" during SGD in training of these NNs is in the same category as divination, when these systems are applied to real-world action. Please understand that I'm not making a blanket claim that ML is witchcraft, or anything like that. I'm making a very narrow claim: when these systems are given agency and implement real-world actions,
those actions are susceptible to influence by witchcraft because the random-number generation process itself cannot be distinguished from divination.
A simpler way to see this is to imagine a man who has a tablet that continually gives him 3 options of what he can do next. This tablet is in front of him and reads like, "A) Go straight down the sidewalk. B) Turn right into the coffee shop. C) Cross the street on the crosswalk." As he chooses what to do, he taps the screen to update and gets the next set of options. He is only permitted to take one of the actions on the tablet, and nothing else. After a while, he gets tired of consciously choosing what to do next, so he begins rolling a d3 instead.
That is divination, plain and simple. And it is also what I am asserting that AI-guided robotic agents in real space are also doing, they're just doing it at zillions of cycles per second.
Thus, we don't have to think of robots as either "mere tools" or "self-directed agents". Rather, they are extensions of human agency which operate, insofar as they rely on apparent randomness, by divination. Final note of clarification: I'm not saying that all forms of robotics are witchcraft and sin. However, because of what I have explained here, we can confidently predict that giving robots agency at parity with humans will inevitably result in tragedy because they are
completely susceptible to influence through witchcraft. And I am particularly identifying the "point of entry" to be the random-number generator.