cloudflare/SortaSQL
Publicmirrored fromhttps://github.com/cloudflare/SortaSQL
README
48lines · modecode
15 years ago
| 1 | SortaSQL |
| 2 | |
| 3 | An extension to PostgreSQL allowing Kyoto Cabinets to be used as a backing data store. |
| 4 | Currently very customized for CloudFlare's usage pattern of online log analysis. |
| 5 | |
| 6 | Install |
| 7 | |
| 8 | Update the paths in common.h |
| 9 | |
| 10 | protoc-c --c_out=c entry.proto |
| 11 | protoc --cpp_out=cpp entry.proto |
| 12 | make |
| 13 | sudo make install |
| 14 | |
| 15 | About |
| 16 | |
| 17 | Google has BigTable and GFS. |
| 18 | Facebook has HBase and Cassandra. |
| 19 | And then there's the up and coming set of Membase, CouchDB, MongoDB, and so on and so forth. |
| 20 | |
| 21 | These all scale by eliminating flexibility, presenting a user with a key-value interface where keys are ordered to allow efficient iteration. |
| 22 | But by throwing out the relational part of a relational database, you lose a lot of value and flexibility in looking at data, not to |
| 23 | mention all of the nice language bindings that old school RDBMs have. |
| 24 | Compounding this, NoSQL DBs are also all new, all have bugs, and no one has that much experience managing them. |
| 25 | Its a brave new world out there. |
| 26 | |
| 27 | At CloudFlare, we spent quite a while evaluating different NoSQL solutions. |
| 28 | We were looking for a reliable and cheap way to store Big Data down the road, while storing Medium Data right now. |
| 29 | We need to be able to scale with one technology, going from a single machine to two machines, to four and so on to full on a cluster. |
| 30 | The catch is we don't want to buy that cluster right now. |
| 31 | |
| 32 | We messed with HBase and Hive in particular, but weren't able to get reliable performance from these while running on a scaled down infrastructure. |
| 33 | For a while then we fell back on Postgres, but pretty soon this was showing signs of creaking under the load. |
| 34 | But we still weren't ready to make the jump to a big rack full of servers. |
| 35 | |
| 36 | Needing a middle ground, we created a hybrid model using both SQL and noSQL technologies. |
| 37 | In our existing Postgres DB, we created a custom data type which acts as a pointer into an embedded Kyoto Cabinet DB. |
| 38 | Kyoto Cabinet is a collection of C functions which allow lookup and sets into a simple data file containing records, each of these consisting of a key/value pair. |
| 39 | Adding a few functions and operators to Postgres allows searching of these KC files both using Postgres's explicit indexes and KC's implicit indexes (keys are ordered and can be stored via a B-Tree). |
| 40 | Storing values serialized using Google's Protocol Buffers allows complex structures to be added as values, while not needing much diskspace. |
| 41 | |
| 42 | See also: |
| 43 | |
| 44 | http://www.postgresql.org/ |
| 45 | http://fallabs.com/kyotocabinet/ |
| 46 | |
| 47 | Copyright 2011 CloudFlare, Inc. |