Superintelligence, a review in many parts

su3su2u1:

theungrumpablegrinch:

su3su2u1:

theungrumpablegrinch:

su3su2u1:

theungrumpablegrinch:

theungrumpablegrinch:

uncrediblehallq:

su3su2u1:

To continue on my response to slatestarscratchpad‘s question about people who have read Bostrom and remain unconvinced, I thought I’d write a longer series of posts.  These same points were made by shlevy more succinctly in his goodreads review.  “that’s completely wrong and anyone with a modicum of familiarity with the field you’re talking about would know that”

Part 1: Igon value problems.  

In a now famous review of a Malcolm Gladwell book, Steven Pinker coined the phrase “igon value problem” to refer to Gladwell’s tendency to expound at length on various topics while also making very superficial errors.  i.e. when discussing statistics,etc Gladwell refers to igon values instead of eigenvalues

Bostrom’s Superintelligence is loaded with these “igon value” problems.  Chapters 1 and 2 are particularly bad for this, as they talk a lot about past AI and current state of AI.  

A few examples I recall off hand: Bostrom says genetic algorithms are stochastic hill climbers.  This isn’t true- the whole point of genetic algorithms is breeding/crossover in order to avoid getting stuck in local optima (like hill climbers do).  It wouldn’t be worth the work to recast a problem using a genetic algorithm, stochastic hill climbers are easy to write.  

He says you can compare many types of algorithms because they are doing “maximum likelihood estimation” but most of the algorithms he lists can do more than maximum likelihood, and some of the algorithms he lists are non-parametric (decision trees).  

Bostrom says that machine learning algorithms are making mathematically well specified trade offs from an “ideal Bayesian agent,” which I’ve expounded on at length on my tumblr blog.  There is no controlled approximation for an “ideal Bayesian agent.”  There is no mathematical sense of “how far from an ideal Bayesian agent” a model is.  

These aren’t isolated mistakes, these igon value issues happen all over the place.  

Now, on one hand, these mistakes aren’t huge and a lot of thrust is mostly correct if you squint at it a bit and ignore the details (which are misleading or just outright wrong). 

But at the same time, the rest of the book is speculation that is only grounded in Bostrom’s understanding of AI.  Do I trust someone with an igon value understanding to reliably extrapolate from the state of the art today?  My answer is no, I don’t.    

Ouch. Depending on how the rest of this series turns out, I may have to stop saying that while I’m down on MIRI, I have a lot of respect for Bostrom…

None of those errors (assuming Bostrom is being represented accurately) is anywhere close to the Igon Value problem and the comparison just sounds partisan.

Maybe Superintelligence is bad and Bostrom is dumb. But this post is not persuasive in that regard.

Okay, I’m no longer on mobile and I have my copy of Superintelligence in front of me. Page 8 of the hardcover edition:

In evolutionary models, a population of candidate solutions (which can be data structures or programs) is maintained, and new candidate solution are generated randomly by mutating or recombining variants in the existing population. Periodically, the population is pruned by applying a selection criterion (a fitness function) that allows only the better candidates to survive into the next generation.Iterated over thousands of generations, the average quality of the solutions in the candidate pool gradually increases.

[…]

In practice, however, getting evolutionary methods to work well requires skill and ingenuity, particularly in devising a good representational format. Without an efficient way to encode candidate solutions (a genetic language that matches latent structure in the target domain), evolutionary search tends to meander endlessly in a vast search space or get stuck in a local optimum.

So Bostrom clearly groks genetic algorithms. (Certainly it reflects my experience with them.)

Okay and here’s the passage @su3su2u1 took issue with, on page 9:

In fact, one of the major theoretical developments of the past twenty years has been a clearer realization of how superficially disparate techniques can be understood as special cases within a common mathematical framework.

[…]

In a similar manner, genetic algorithms can be viewed as performing stochastic hill-climbing, which is again a subset of a wider class of algorithms for optimization.

So I think the original criticism is even less fair than I’d thought. Disappointing.

So the first section looks like a textbook description of a genetic algorithm. 

The second section, the one you are quoting, is just flat out wrong.  Stochastic hill climbing is a local search, genetic algorithms are more broad than local search, and everything is a subset of “algorithms for optimization” because it’s so vague as to be undefined.  

It’s that sort of “igon value” wrong where if you squint and move past it it isn’t going to break the book.  Which is exactly my criticism. 

“evolutionary search tends to meander endlessly in a vast search space or get stuck in a local optimum.“

Do you disagree?

I’m really just baffled here because I’ve pretty much always heard and read GAs discussed as stochastic hill-climber variants, e.g. as an alternative to simulated annealing.

The typical failure mode I tend to see is that they converge too quickly on non-optima (to just arbitrary points).  

Where are you seeing GA discussed as hill-climbers?  That’s just not what they are.  They are usually discussed as alternatives to hill climbers for situations where you expect the hill climber to get stuck.  The whole point of mixing (breeding and mutation) is to perform a non-local search- you hope to bust the population out of the local optima.  The typical use case of genetic algorithms is when we expect lots of local optima where hill climbers will get stuck.  

The typical comparison is between genetic algorithms and simulated annealing because they are both commonly used on the same sorts of problems (global optimization in situations where there are lots of local optima). 

Evolutionary search is not the same as local search, they aren’t doing the same thing.  

But basically they are. The programmer’s choice of representational format is where the work happens (as Bostrom points out). Evolutionary search is local search with a population.

Wikipedia also classifies genetic algorithms as stochastic optimizers, FWIW.

I do not think there is a wikipedia page which takes as a given the Igon Value spelling. Your comparison remains unfair.

Stochastic optimizers are NOT stochastic hill climbers.  Any optimizer with a random variable is a stochastic optimizer.  You could make a stochastic hill climber by tying in a random variable.  Now it’s a hill climber that is a subclass of stochastic optimizer.  

Simulated annealing will also be in there, probably under metaheuristic (or maybe randomized search).  Genetic algorithms will be in a similar category.  Is simulated annealing also a hill climber?  Are we going to abuse terminology that much?  

Hill climber = local search.  LOCAL search.  Not evolutionary search.  Words mean things. 

Genetic algorithms are like hill climbers in the same way bananas are like apples.  They are both fruit! So I wasn’t wrong when I said that the banana was an apple. 

Simulated annealing is much closer to genetic algorithms in use case than genetic algorithms are to hill climbers. See literally any book on optimization alogirhtms.  See wikipeidas genetic algorithm and hill climber page. 

If anything, Bostrom is guilty of not understanding that “hill climber” is a specific term of art. “Iff it uses a local gradient approximation.”