Log in

No account? Create an account

Previous Entry | Next Entry

I'm really starting to wonder about this "NoSQL movement". Everything I've read so far seems to have a severe case of tunnel vision - the only database anyone mentions in these NoSQL articles is MySQL, and they discuss how they have all these issues scaling it... and gee, it really sucks that you can't easily make schema changes all the time when you're running in production, or add indexes, or what have you.

Well kids, I hate to say it but:

a) Spend a little more time on your database designs. The more you think out your design, the fewer changes you'll need to make later.
b) Figure out how to actually make a RELATIONAL database, use primary keys, try to actually take advantage of all the features of the system.
c) Learn when to normalize and denormalize your tables.
d) You probably don't need to go to the database every single time your user makes a request, so try and be a tiny bit more clever about caching and database connections.
e) Oh, and last but not least... use a database that CAN SCALE, like Oracle or SQL Server. Yeah, I know, they cost money, but guess what? You get what you pay for. All the Fortune 50 are using Oracle and SQL Server in production, and they don't complain about how it's not scaling.

None of these issues seems to be a particularly good rationale for abandoning the RDBMS / ACID paradigm. I guarantee that nine times out of ten developers who complain about the limitations of RDBMS could solve all of their problems by following any of the advice above, and the NoSQL technologies will only matter for extremely specialized cases. Look at what you sacrifice when you go this route:

"No schemas: NoSQL databases lack SQL's pre-defined table schemas, which makes changing data models simpler, but also offers no protection against invalid or outdated values in records." Are you high?

"No, or limited, data joins: generally speaking there are no built-in methods for chaining requests together in the style of SQL joins. Data denormalization must happen at the application layer." Man, sucks to be an end user! Oh, and how does it "solve" the problem if you're just moving the work from one layer to another?

"Restricted query interfaces: SQL is a mature and powerful query language, and the APIs available for NoSQL systems are not always as flexible. But, there are new capabilities here as well, such as CouchDB's flexible map-reduced based views." Wow, who cares? I'd like to be able to do a GROUP BY and not be worried that it's not implemented properly

I really don't see it. I mean, what problem are these guys solving, really? They're trying to create this whole new way of handling data that isn't robust, instead of using tools that actually work, just because they cost money? That's fine if you're a billion dollar company like Google or Amazon, who probably don't actually care whether or not their NoSQL projects succeed, but why spend all this time and money chasing a solution that will probably take decades of development and tuning?


( 7 comments — Leave a comment )
Feb. 24th, 2010 01:13 pm (UTC)
Actually, I think its a lot scarier than you've set forth in your comments here. Based on what I've read about it (and don't think term hasn't come out of a well-meaning client's mouth at one point) the goal seems to be to boost horizontal scalability. As far as I can tell, the goal is to be able to add new nodes to a database cluster without worrying about things like "synchronization" and "consistent recordset returns." It seems like they have a methodology that works for situations where you don't need the information your return to be complete, or even up-to-date, as long as you return something.

That makes sense if you're generating Facebook's News Feed, or a set of recommendations for Amazon -- you can probably just take whatever the server that receives the request thinks is the correct response to that query, and if its a few hours or days out of date, so be it. If you're dealing with, say, multidivisional corporate finance management, that's horrifically bad.

On the upside, so far this technology has really only been developed for special-case applications that can actually leverage it for some useful purpose. For the rest of us, there are MSSQL and Oracle and DB2 clusters running CDC-based near-real-time synchronization. Sure, it costs a million dollars a year to maintain, but once you're at the stage of needing that kind of solution, a million bucks a year isn't that much.
Feb. 24th, 2010 01:39 pm (UTC)
I guess companies with no business model who make no money have a different set of requirements. :)
Feb. 24th, 2010 01:58 pm (UTC)

Wait. So you have no schemas so you can add tables and stuff willy-nilly, but you don't have any joins so you can't link any of the tables/data together?

That's just-- oh, wait, I get it. There's no point in having joins because it will keep returning duplicate data because without schemas you're going to be storing the same data over and over again in different places because you can't track it.

Actually, no that's wrong, you will need the joins because while you'll be storing "LastName" everywhere, no two values will ever be the same even for two supposedly identical people. So the joins help you see how fubared your data is...so...I guess then you don't need it after all.

Man these guys are good.

Frankly, I think pretty much every major problem in computer science has already been solved -- and mostly by guys who didn't actually have a computer to work with. Sure, there are still new algorithms out there but for what most people want to do? There's a book written by a guy who was dead before you were born that explains how to solve it.

Feb. 24th, 2010 02:21 pm (UTC)
I'm actually seriously concerned that this may be the effect of the "anybody can be a programmer" phenomenon in industry. Because so many people have no formal training in computer science, you end up with all these hotshots who think they're doing amazing stuff, yet they're re-inventing the wheel, except what they're making is actually a cube instead of a cylinder/torus.

I can't tell you how many times I keep running into people who started out as script kiddies or child hobbyists, got a non-technical degree, and are now highly paid consultants who couldn't tell you what recursion or encapsulation are, let alone how to design databases to 4NF.
Feb. 25th, 2010 02:42 am (UTC)
NoSQL isn't a replacement for SQL, it's just another tool along side it. NoSQL solves the persistent data, random access, high speed access problem. Yes, you can make SQL do key-value pairs, but even with spending money on Oracle etc you're no going to be able to pull 10s of Ks worth of records in 10s of ms. And yes, you could just use a Memcache layer on top of SQL, but now you're double writing, you have unneeded evictions cause of Memcache's crappy slab setup, and all your standard cache stampeding issues, and on top of it all, Memcache still isn't that fast.

I'd love to use pretty and perfect SQL for everything, but sometimes it's just not the right tool for the job, and when you're talking huge web-scale apps, it just can't keep up.
Feb. 25th, 2010 06:09 am (UTC)
Nope, that's Kool-Aid you're drinking there, son.
Feb. 25th, 2010 06:19 am (UTC)
And the reason why I know that you're drinking Kool-Aid is because I know for a fact that Goldman Sachs isn't using NoSQL, they're using good old RDBMS's. You think "huge web-scale apps" compares at all to the transaction volumes and writes that securities banks or insurance companies deal with? You think that consistency might be a tiny bit important for what they're doing? You need to put down the NoSQL crack pipe.
( 7 comments — Leave a comment )

Latest Month

August 2010


Powered by LiveJournal.com
Designed by Tiffany Chow