Home News Feeds Planet MySQL
Newsfeeds
Planet MySQL
Planet MySQL - http://www.planetmysql.org/

  • Liveblogging at Confoo: Blending NoSQL and SQL
    Persistence Smoothie: Blending NoSQL and SQL – see user feedback and comments at http://joind.in/talk/view/1332. Michael Bleigh from Intridea, high-end Ruby and Ruby on Rails consultants, build apps from start to finish, making it scalable. He’s written a lot of stuff, available at http://github.com/intridea. @mbleigh on twitter NoSQL is a new way to think about persistence. Most NoSQL systems are not ACID compliant (Atomicity, Consistency, Isolation, Durability). Generally, most NoSQL systems have: Denormalization Eventual Consistency Schema-Free Horizontal Scale NoSQL tries to scale (more) simply, it is starting to go mainstream – NY Times, BBC, SourceForge, Digg, Sony, ShopWiki, Meebo, and more. But it’s not *entirely* mainstream, it’s still hard to sell due to compliance and other reasons. NoSQL has gotten very popular, lots of blog posts about them, but they reach this hype peak and obviously it can’t do everything. “NoSQL is a (growing) collection of tools, not a new way of life.” What is NoSQL? Can be several things: Key-Value Stores Document Databases Column-oriented data stores Graph Databases Key-Value Stores memcached is a “big hash in the sky” – it is a key value store. Similarly, NoSQL key-value stores “add to that big hash in the sky” and store to disk. Speaker’s favorite is Redis because it’s similar to memcached. key-value store + datatypes (list, sets, scored sets, soon hashes will be there) cache-like functions (like expiration) (Mostly) in-memory Another interesting key-value store is Riak Combination of key-value store and document database heavy into HTTP REST You can create links between documents, and do “link walking” that you don’t normally get out of a key-value store built-in Map Reduce Map Reduce: Massively parallel way to process large datasets First you scour data and “map” a new set of dataM Then you “reduce” the data down to a salient result — for example, map reduce function to make a tag cloud: map function makes an array with a tag name and a count of 1 for each instance of that tag, and the reduce tag goes through that array and counts them… http://en.wikipedia.org/wiki/MapReduce Other key-value stores: Tokyo Cabinet Dynomite memcachedDB Voldemort Document Databases Some say that it’s the “closest” thing to real SQL. MongoDB – Document store that speaks BSON (Binary JSON, which is compact). This is the speaker’s favorite because it has a rich query syntax that makes it close to SQL. Can’t do joins, but can embed objects in other objects, so it’s a tradeoff Also has GridFS that can store large files efficiently, can scale to petabytes of data does have MapReduce but it’s deliberate and you run it every so often. CouchDB Pure JSON Document Store – can query directly with nearly pure javascript (there are auth issues) but it’s an interesting paradigm to be able to run your app almost entirely through javascript. HTTP REST interface MapReduce only to see items in CouchDB. Incremental MapReduce, every time you add or modify a document, it dynamically changes the functions you’ve written. You can do really powerful queries as easy as you can do simple queries. However, some things are really complex, ie, pagination is almost impossible to do. Intelligent Replication – CouchDB is designed to work with offline integration. Could be used instead of SQLite as the HTML5 data store, but you need CouchDB running locally to be doing offline stuff w/CouchDB Column-oriented store Columns are stored together (ie, names) instead of rows. Lets you be schema-less because you don’t care about a row’s consistency, you can just add a column to a table very easily. Cassandra – Built by Facebook, also used by Twitter BigTable Hypertable HBase Graph Databases speaker’s opinion – there aren’t enough of these. Neo4J – can handle modeling complex relationships – “friends of friends of cousins” but it requires a license. When should I use this stuff? If you have:Use Complex, slow joins for an “activity stream”Denormalize, use a key-value store. Variable schema, vertical interactionDocument database or column store Modeling multi-step relationships (linkedin, friends of friends, etc)Graph Don’t look for a single tool that does every job. Use more than one if it’s appropriate, weigh the tradeoffs (ie, don’t have 7 different data stores either!) NoSQL solves real scalability and data design issues. But financial transactions HAVE to be atomic, so don’t use NoSQL for those. A good presentation is http://www.slideshare.net/bscofield/the-state-of-nosql. Using SQL and NoSQL together Why? Well, your data is already in an SQL database (most likely). You can blend by hand, but the easy way is DataMapper: Generic, relational ORM (adapters for many SQL dbs and many NoSQL stores) Implements Identity Map Module-based inclusion (instead of extending from a class, you just include into a class). You can set up multiple data targets (default is MySQL, example sets up MongoDB too). DataMapper is: Ultimate Polyglot ORM simple r’ships btween persistence engines are easy jack of all, master none Sometimes perpetuates false assumptions – If you’re in Ruby, your legacy stuff is in ActiveRecord, so you’re going to have to rewrite your code anyway. Speaker’s idea to be less generic and better use of features of each data store – Gloo – “Gloo glues together different ORMs by providing relationship proxies.” this software is ALPHA ALPHA ALPHA. The goal is to be able to define relationships on the terms of any ORM from any class, ORM or not Right now – partially working activeRecord relationships Is he doing it wrong? Is it a crazy/stupid idea? Maybe. Example: NeedUse Assume you already have an auth systemit’s already in SQL, so leave it there. Need users to be able to purchase items from the storefront – Can’t lose transactions, need full ACID complianceuse MySQL. Social Graph – want to have activity streams and 1-way and 2-way relationships. Need speed, but not consistencyuse Redis Product Listings — selling moves and books, both have different properties, products are pretty much non-relationaluse MongoDB He wrote the example in about 3 hours, so integration of multiple data stores can be done quickly and work.

  • Liveblogging at Confoo: [not just] PHP Performance by Rasmus Lerdorf
    Most of this stuff is not PHP specific, and Python or Ruby or Java or .NET developers can use the tools in this talk. The session on joind.in, with user comments/feedback, is at http://joind.in/talk/view/1320. Slides are at http://talks.php.net/show/confoo10 “My name is Rasmus, I’ve been around for a long time. I’ve been doing this web stuff since 1992/1993.” “Generally performance is not a PHP problem.” Webservers not config’d, no expire headers on images, no favicon. Tools: Firefox/Firebug extension called YSlow (developed by yahoo) gives you a grade on your site. Google has developed the Firefox/Firebug pagespeed tool. Today Rasmus will pick on wordpress. He checks out the code, then uses Siege to do a baseline benchmark — see the slide for the results. Before you do anything else install an opcode cache like APC. Wordpress really likes this type of caching, see this slide for the results. Set the timezone, to make sure conversions aren’t being done all the time. Make sure you are cpu-bound, NOT I/O bound. Otherwise, speed up the I/O. Then strace your webserver processs. There are common config issues that you can spot in your strace code. grep for ENOENT which shows you “No such file or directory” errors. AllowOverride None to turn off .htaccess for every directory, just read settings once from your config file….(unless you’re an ISP). Make sure DirectoryIndex is set appropriately, watch your include_path. All this low-hanging fruit has examples on the common config issues slide. Install pecl/inclued and generate a graph – here is the graph image (I have linked it because you really want to zoom in to the graph…) In strace output check the open() calls. Conditional includes, function calls that include files, etc. need runtime context before knowing what to open. In the example, every request checks to see if we have the config file, once we have config’d we can get rid of that stuff. Get rid of all the conditionals and hard-code “include wp-config.php”. Examples are on the slide. His tips to change: Conditional config include in wp-load.php (as just mentioned) Conditional did-header check in wp-blog-header.php Don’t call require_wp_db() from wp-settings.php Remove conditional require logic from wp_start_object_cache Then check strace again, now all Rasmus sees is theming and translations, which he decided to keep, because that’s the good benefit of Wordpress – Performance is all about costs vs. flexibility. You don’t want to get rid of all of your flexibility, but you want to be fast. Set error_reporting(-1) in wp-settings.php to catch all warnings — warnings slow you down, so get rid of all errors. PHP error handling is very slow, so getting rid of errors will make you faster. The slide of warnings that wordpress throws. Look at all C-level calls made, using callgrind, which sits under valgrind, a CPU emulator used for debugging. See the image of what callgrind shows. Now dive into the PHP executor, by installing XDebug. Check xhprof – Facebook open sourced this about a year ago, it’s a PECL extension. The output is pretty cool, try it on your own site, Rasmus does show you how to use it. It shows you functions sorted by the most expensive to the least expensive. For example, use $_SERVER[REQUEST_TIME] instead of time(). Use pconnect() if MySQL can handle the amount of webserver connections that will be persistent, etc. After you have changed a lot of the stuff above, benchmark again with siege to see how much faster you are. In this case there is not much gained so far. So keep going….the blogroll is very slow — Rasmus gets rid of it by commenting out in the sidebar.php file. I’d like to see something to make it “semi-dynamic” — that is, make it a static file that can be re-generated, since you might want the blogroll but links are not changed every second….. At this point we’re out of low-hanging fruit. HipHop is a PHP to C++ converter & compiler, including a threaded, event-driven server that replaces apache. Rasmus’ slide says “Wordpress is well-suited for HipHop because it doesn’t have a lot of dynamic runtime code. This is using the standard Wordpress-svn checkout with a few tweaks.” Then, of course, benchmark again. The first time you compile Wordpress with HipHop, you give it a list of files to add to the binary, it will complain about php code that generate file names, so you do have to fix that kind of stuff. There’s a huge mess of errors the first time you run it (”pages and pages”), and Rasmus had to patch HipHop (and Wordpress) but the changes in HipHop have been put back into HipHop, so you should be good for the most part. Check out the errors, lots of them show logical errors like $foo.”bar” instead of $foo.=”bar” and $foo=”bar” instead of $foo==”bar” in an if statement. Which of course is nice for your own code, to find those logical errors. (Wordpress takes in a $user_ID argument and immediately initializes a global $user_ID variable, which overwrites the argument passed in, so you can change the name of the argument passed in….) You can also get rid of some code, things that check for existence of the same thing more than once. So it will take a bit of tweaking, but it’s worth it. There are limitations to HipHop, for example: It doesn’t support any of the new PHP 5.3 language features Private properties don’t really exist under HipHop. They are treated as if they are protected instead. You can’t unset variables. unset will clear the variable, but it will still be in the symbol table. eval and create_function are limited Variable variables $$var are not supported Dynamic defines won’t work: define($name,$value) get_loaded_extensions(), get_extension_funcs(), phpinfo(), debug_backtrace() don’t work Conditional and dynamically created include filenames don’t work as you might expect Default unix-domain socket filename isn’t set for MySQL so connecting to localhost doesn’t work and HipHop does not support all extensions — see the list Rasmus has of extensions HipHop supports. Then Rasmus showed an example using Twit (which he wrote) including the benchmarks. He shows that you can see what’s going on, like 5 MySQL calls on the home page and what happens when you don’t have a favicon.ico (in yellow). In summary, “performance is all about architecture”, “know your costs”. Be careful, because some tools (like valgrind and xdebug) you don’t want to put it on production systems, you could capture production traffic and replay it on a dev/testing box, but “you just have to minimize the differences and do your best”.

  • Writing A Storage Engine for Drizzle, Part 2: CREATE TABLE
    The DDL code paths for Drizzle are increasingly different from MySQL. For example, the embedded_innodb StorageEngine CREATE TABLE code path is completely different than what it would have to be for MySQL. This is because of a number of reasons, the primary one being that Drizzle uses a protobuf message to describe the table format instead of several data structures and a FRM file. We are pretty close to having the table protobuf message format being final (there’s a few bits left to clean up, but expect them done Real Soon Now (TM)). You can see the definition (which is pretty simple to follow) in drizzled/message/table.proto. Also check out my series of blog posts on the table message (more posts coming, I promise!). Drizzle allows either your StorageEngine or the Drizzle kernel to take care of storage of table metadata. You tell the Drizzle kernel that your engine will take care of metadata itself by specifying HTON_HAS_DATA_DICTIONARY to the StorageEngine constructor. If you don’t specify HTON_HAS_DATA_DICTIONARY, the Drizzle kernel stores the serialized Table protobuf message in a “table_name.dfe” file in a directory named after the database. If you have specified that you have a data dictionary, you’ll also have to implement some other methods in your StorageEngine. We’ll cover these in a later post. If you ever dealt with creating a table in MySQL, you may recognize this method: virtual int create(const char *name, TABLE *form, HA_CREATE_INFO *info)=0; This is not how we do things in Drizzle. We now have this function in StorageEngine that you have to implement: int doCreateTable(Session* session, const char *path,                   Table& table_obj,                  drizzled::message::Table& table_message) The existence of the Table parameter is largely historic and at some point will go away. In the Embedded InnoDB engine, we don’t use the Table parameter at all. Shortly we’ll also get rid of the path parameter, instead having the table schema in the Table message and helper functions to construct path names. Methods name “doFoo” (such as doCreateTable) mean that there is a method named foo() (such as createTable()) in the base class. It does some base work (such as making sure the table_message is filled out and handling any errors) while the “real” work is done by your StorageEngine in the doCreateTable() method. The Embedded InnoDB engine goes through the table message and constructs a data structure for the Embedded InnoDB library to create a table. The ARCHIVE storage engine is much simpler, and it pretty much just creates the header of the ARZ file, mostly ignoring the format of the table. The best bet is to look at the code from one of these engines, depending on what type of engine you’re working on. This code, along with the table message definition should be more than enough

  • Surveying MySQL’s Popular Storage Engines
    In this month’s Database Journal piece we look at the spectrum of MySQL storage engines available, and examine what some of their strengths and weaknesses are. View the article here: Survey of MySQL Storage Engines

  • Emulating a 'top' CPU summary using /proc/stat and MySQL
    In my last blog post, I showed how we can get some raw performance information from /proc into the MySQL database using a LOAD DATA INFILE (LDI) command. I've modified that LDI call slightly to set the `other` column to equal the sum total of the CPU counters for those rows which begin with 'cpu'.original: other = IF(@the_key like 'cpu%', NULL , @val1);new: other = IF(@the_key like 'cpu%', user + nice + system + idle + iowait + irq + softirq + steal + guest, @val1);Top provides a useful output that looks something like the following: top - 04:59:14 up 14 days, 3:34, 1 user, load average: 0.00, 0.00, 0.00 Tasks: 216 total, 1 running, 215 sleeping, 0 stopped, 0 zombie Cpu(s): 0.0%us, 0.0%sy, 0.0%ni, 99.9%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 8172108k total, 5115388k used, 3056720k free, 315180k buffers Swap: 2097144k total, 0k used, 2097144k free, 3630748k cached The information I'm currently concerned with presenting is the CPU summary: Cpu(s): 0.0%us, 0.0%sy, 0.0%ni, 99.9%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st In order to emulate this display, we will need to sample two data points from /proc/stat. Load the data from proc_stat Sleep 1 second Load the data again Compare the valuesYou should end up with something similar to the following: mysql> select * from test.proc_stat where the_key = 'cpu'; +-----+---------+--------+-------+--------+------------+--------+------+---------+-------+-------+------------+ | seq | the_key | user | nice | system | idle | iowait | irq | softirq | steal | guest | other | +-----+---------+--------+-------+--------+------------+--------+------+---------+-------+-------+------------+ | 1 | cpu | 440022 | 36207 | 94583 | 1976124562 | 89082 | 858 | 27243 | 0 | 0 | 1976812557 | | 24 | cpu | 440024 | 36207 | 94583 | 1976130493 | 89082 | 858 | 27243 | 0 | 0 | 1976818490 | +-----+---------+--------+-------+--------+------------+--------+------+---------+-------+-------+------------+ 2 rows in set (0.00 sec) To display the CPU utilization, run the following query: select 100 * ( ( new.user - old.user ) / ( new.other - old.other ) ) user, 100 * ( ( new.nice - old.nice ) / ( new.other - old.other ) ) nice, 100 * ( ( new.system - old.system ) / ( new.other - old.other ) ) system, 100 * ( ( new.idle - old.idle ) / ( new.other - old.other ) ) idle, 100 * ( ( new.iowait - old.iowait ) / ( new.other - old.other ) ) iowait, 100 * ( ( new.irq - old.irq ) / ( new.other - old.other ) ) irq, 100 * ( ( new.softirq - old.softirq ) / ( new.other - old.other ) ) softer, 100 * ( ( new.steal - old.steal ) / ( new.other - old.other ) ) steal, 100 * ( ( new.guest - old.guest ) / ( new.other - old.other ) ) guest from test.proc_stat old, test.proc_stat new where new.seq > old.seq and old.the_key = 'cpu' and new.the_key = old.the_key; +--------+--------+--------+---------+--------+--------+--------+--------+--------+ | user | nice | system | idle | iowait | irq | softer | steal | guest | +--------+--------+--------+---------+--------+--------+--------+--------+--------+ | 0.0337 | 0.0000 | 0.0000 | 99.9663 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | +--------+--------+--------+---------+--------+--------+--------+--------+--------+ 1 row in set (0.01 sec) edit: for completeness sake, here is the SQL script I execute to load the data from proc: CREATE TABLE IF NOT EXISTS test.proc_stat ( seq tinyint auto_increment primary key, the_key char(25) NOT NULL, user bigint, nice bigint, system bigint, idle bigint, iowait bigint, irq bigint, softirq bigint, steal bigint, guest bigint, other bigint ); /* MySQL treats consecutive delimiters as separate fields, so some fancy footwork is required to load the file successfully. The file includes a cpu field followed by two spaces which is the sum of all the individual CPUs in the system. To account for this each row is read into some MySQL variables. Those variables are examined to determine which field holds the correct value. */ LOAD DATA INFILE '/proc/stat' IGNORE INTO TABLE test.proc_stat FIELDS TERMINATED BY ' ' (@the_key, @val1, @val2, @val3, @val4, @val5, @val6, @val7, @val8, @val9, @val10) SET the_key = @the_key, user = IF(@the_key NOT LIKE 'cpu%', NULL, IF(@the_key != 'cpu', IFNULL(@val1, 0), IFNULL(@val2,0))), nice = IF(@the_key NOT LIKE 'cpu%', NULL, IF(@the_key != 'cpu', IFNULL(@val2, 0), IFNULL(@val3,0))), system = IF(@the_key NOT LIKE 'cpu%', NULL, IF(@the_key != 'cpu', IFNULL(@val3, 0), IFNULL(@val4,0))), idle = IF(@the_key NOT LIKE 'cpu%', NULL, IF(@the_key != 'cpu', IFNULL(@val4, 0), IFNULL(@val5,0))), iowait = IF(@the_key NOT LIKE 'cpu%', NULL, IF(@the_key != 'cpu', IFNULL(@val5, 0), IFNULL(@val6,0))), irq = IF(@the_key NOT LIKE 'cpu%', NULL, IF(@the_key != 'cpu', IFNULL(@val6, 0), IFNULL(@val7,0))), softirq = IF(@the_key NOT LIKE 'cpu%', NULL, IF(@the_key != 'cpu', IFNULL(@val7, 0), IFNULL(@val8,0))), steal = IF(@the_key NOT LIKE 'cpu%', NULL, IF(@the_key != 'cpu', IFNULL(@val8, 0), IFNULL(@val9,0))), guest = IF(@the_key NOT LIKE 'cpu%', NULL, IF(@the_key != 'cpu', IFNULL(@val9, 0), IFNULL(@val10,0))), other = IF(@the_key like 'cpu%', user + nice + system + idle + iowait + irq + softirq + steal + guest, @val1);

Bicolor template supported by Naturalife Greenworld