Carl de Marcken: Inside Orbitz

a friend, describing their experiences using Lisp in one of the software

industry’s most demanding

applications.)Date: Fri, 12 Jan 2001 15:42:34 -0500

From: Carl de MarckenGeoffrey,Here are some tidbits…1. Right now Sabre, Galileo, Amadeus and Worldspan operate many

millions of dollars of IBM and Unisys mainframes each to

answer the vast majority of queries done by airline phone agents,

airport desk agents, travel agents, and travel web sites (other than

our own and our customers’). Their computers are housed in

bomb-proof, fire-walled (literally) complexes in Kansas City, Denver,

Germany and Atlanta, and mostly run assembly language code for

performance reasons. From what we can discern, their algorithms are

basic: until we pointed it out to them I don’t think they had any

understanding of how hard the problem they’re trying to solve is, or

how far their solutions are from optimal.2. ITA Software is slowly replacing the industry’s hardware and

software with Common Lisp code running on Linux PCs, that uses

relatively involved algorithms that show off our academic CS

background. The easiest place to see the code in action is on our web

site,

www.itasoftware.com.3. The vast majority of our “thinking” code is in Common Lisp. We run

both

CMUCL and

Franz,

under Linux/Intel, HPUX/PA, and NT/Intel, and

have about 200,000 lines of Lisp in our base search engine. Our web

site page generation code is also largely written in Common Lisp,

though there’s also fair bit of Java there.4. Because we have about 2 gigs of static data we need rapid access

to, we use C++ code to memory-map huge files containing pointerless C

structs (of flights, fares, etc), and then access these from Common

Lisp using foreign data accesses. A struct field access compiles into

two or three instructions, so there’s not really any performance.

penalty for accessing C rather than Lisp objects. By doing this, we

keep the Lisp garbage collector from seeing the data (to Lisp, each

pointer to a C object is just a fixnum, though we do often temporarily

wrap these pointers in Lisp objects to improve debuggability). Our

Lisp images are therefore only about 250 megs of “working” data

structures and code.5. Every query that hits our site gets sent via tcpip to a Lisp

process running on an dual 800mhz x86 Linux box with 2g of ram ($3000,

vs about $1,000,000 for a similarly capable mainframe), and the

process devotes between 5 and 15 seconds of CPU time to it. One of

our customers will have 200 such boxes, each running 2 or 3 Lisp

processes. We save on ram by putting multiple processes on one box,

since the virtual memory system automatically shares our read-only

memory-mapped files between processes.6. If you want to do a simple round-trip from BOS to LAX in two weeks,

coming back in three, willing to entertain a 24 hour departure window

for both parts, then limiting to “reasonable” routes (at most 3

flights and at most 10 hours or so) you have about 5,000 ways to get

there and 5,000 ways to get back. Listing them is a mostly trivial

graph-search (there are a few minor complications, but not many), that

anybody could do in a fraction of a second.7. The real challenge is that a single fixed itinerary (a fixed set of

flights from BOS to LAX and a fixed set back) with only two flights in

each direction may have more than 10,000 possible combinations of

applicable “fares”, each fare with complex restrictions that must be

checked against the flights and the other fares. That means that the

search space for this simple trip is of the order 5000 x 5000 x 10000,

and a naive program would need to do a lot of computation just to

validate each of these possibilities. Suitably formalized, its not

even clear that the problem of finding the cheapest flight is

NP-complete, since it is difficult to put a bound on the size of the

solution that will result in the cheapest price. If you’re willing to

dispense with restrictions on the energy in the universe, then it is

actually possible to formalize the cheapest-price problem in a

not-too-unreasonable way that leads to a proof of undecidability by

reduction to the Post correspondance problem :-).8. Our Lisp code is running very clever algorithms that let us produce

in a reasonable time a data structure we call the “pricing-graph” from

which we can very efficiently answer a query of the form “give me the

k-th best solution (a validated set of flights and fares), ordered

according to the function f”, assuming of course certain restrictions

on f, where the number of answers represented by the pricing-graph is

10^20 - 10^30 depending on the type of trip. In this way, we can

reasonably claim that in 10 seconds we can produce 10^30 answers, even

if we could not possibly enumerate the list of such answers.9. We can do 10 seconds of Lisp computation on a 800mhz box and cons

less than 5k of data. This is because we pre-allocate all data

structures we need and die on queries that exceed them. This may make

many Lisp programmers cringe, but with a 250 meg image and real-time

constraints, we can’t afford to generate garbage. For example, rather

than using cons, we use “cons!”, which grabs cells from an array of

10,000,000 cells we’ve preallocated and which gets reset every query.10. A lot of our Lisp is designed to compile into very efficient

assembly. We make a lot of use of Lisp’s macro capabilities, but shy

away from many other Lisp features like closures, generic functions,

complex sequence functions and garbage collection. We’re doing an

incredible amount of computation - getting 10 seconds on a modern

machine is an incredible gift - but if we’re sloppy at all 10 seconds

can turn into ten minutes, not adequate for a travel agent or web

site. We disassemble most every Lisp function looking for

inefficiencies and have had both CMUCL and Franz enhanced to compile

our code better.11. Occasionally we’ve had to move code from Lisp to C++, usually

because of data loading issues (Lisp garbage collectors just can’t

deal with gigs of data, and there’s no way to rapidly load gigs of

data into a Lisp). Our experience has been a 10 to 1 code expansion;

I don’t think there are any programmers in our company that regret the

choice of Common Lisp.12. We’ve had very little trouble getting non-Lisp programmers to read

and understand and extend our Lisp code. The only real problem is

that the training most programmers have in Lisp has taught them to

code very inefficiently, without paying any attention to the compiler.

Of course, with things like STL and Java, I think programmers of other

languages are also becoming pretty ignorant.

Date: Tue, 15 Jan 2002 17:49:04 -0800

From: Carl de MarckenPaul,I don’t have any problems with it going up on a site, but

please make a note that this message is old and the world is constantly

changing: we now have thousands of CPUs running our code, and various

airlines and major web sites (Orbitz, e.g.) depending on it. The mainframes

are disappearing as our stuff replaces it.

https://paulgraham.com/carl.html

📚 返回 Paul Graham 文章目录

Carl de Marcken在Orbitz内部

Paul Graham关于Carl de Marcken在Orbitz内部工作的文章

Carl de Marcken: Inside Orbitz