Welcome to the blog of Marcel Oelke from Frankfurt, Germany.

Archive for MySQL

Who is the Subversion King in your Company?

Have you ever wanted to know who’s the top committer in your company?
In my previous company we etablished the term “CVS King”, a title comparable to “Employee of the month”. The developer with the most cvs commits was the “CVS King of the month”. We determined who was the “CSV King” using commit emails that were sent to all developers on each cvs commit.
Two years ago we switched to Subversion, so now we’re talking about the “Subversion King”. Naturally all this is anything but serious ;)

Anyway, today i programmed a little php script that uses a different approach to determine who is the “Subversion King of the Month”. It’s counting the line delta directly from the svn repository using svnlook. So the developer with the most lines added to the repository (not the most commits) is the number one.
Here’s some example output (names are anonymized):

--- Most productive users for 11/2007 ---
1   usera                6319      42.47%
2   userb                5797      38.96%
3   userc                1990      13.37%
4   userd                 773      05.20%
--- Most active commiters for 11/2007 ---
1   usera                  47         47%
2   userb                  34         34%
3   userc                   4          4%
4   userd                   2          2%
-----------------------------------------

The script reads all commits (revisions) for the current month, counts the line delta (how many lines have been added/removed) and the amount of commits.
You can view the highlighted source here or download the script here. Use it on the command line like that:
php -f /path/to/svn/repository

I know that the number of lines comitted may not be the one and only criteria to measure the productivity of a developer. But its an indicator to start with.
So the question is: Are you the “Subversion King” in your company/project?

P.S.
Again, please, dont take this serious ;)

Random Timestamps in MySQL

Have you ever needed a random timestamp in MySQL? For example to create demo data programmatically? Here’s my solution:

SELECT FROM_UNIXTIME(
  FLOOR(
    UNIX_TIMESTAMP('2007-01-01') +
        RAND() *
	(UNIX_TIMESTAMP('2007-01-03')-UNIX_TIMESTAMP('2007-01-01'))
    )
) as random_timestamp;

Or if you prefer a function:

CREATE FUNCTION random_timestamp (start TIMESTAMP, end TIMESTAMP)
RETURNS TIMESTAMP NOT DETERMINISTIC
RETURN FROM_UNIXTIME(
  FLOOR(
    UNIX_TIMESTAMP(start) +
    RAND() *
    (UNIX_TIMESTAMP(end)-UNIX_TIMESTAMP(start))
  )
);

mysql> select random_timestamp('2007-10-01', NOW());
+---------------------------------------+
| random_timestamp('2007-10-01', NOW()) |
+---------------------------------------+
| 2007-10-05 23:07:11                   |
+---------------------------------------+
1 row in set (0.00 sec)

How to name something that imports and exports ?

Simple Question:
When a script/function/class exports something, I name it “exporter”.
When it imports something, I name it “importer”.
How do I name it when it does both?

What do you think? Write your thoughts into the comments please. Thanks!

MySQL Performance: Use counter tables

I guess many of you know, that using SELECT count(*) FROM table is problematic and slow when using Innodb tables.
This actually only applies to COUNT(*) queries without WHERE a clause as mentioned in the MySQL Performance Blog.

But if you got some slow count query in your application the best way to increase its performance is to replace / remove it.

So if you are going do to "SELECT count(*) FROM products" the best way, is to have a separated table
that stores the number of products. If you're inserting a row increment the counter, if you're deleting a row, decrement it.

Here is some example:
CREATE TABLE counter( number_of_products int(10) DEFAULT '0' NOT NULL);

Increment when you're adding a new product to the products table:

SQL:
  1. UPDATE counter SET number_of_products = number_of_products +1;

Decrement when you're removing a product:

SQL:
  1. UPDATE counter SET number_of_products = number_of_products -1;

In one of my applications i have the case that i have many reads on a table, but just a few updates / writes to the table. The count was in my case not related to a table, but specific to a row of it.

Example:

SQL:
  1. CREATE TABLE user (
  2. user_id int(10) DEFAULT '0' NOT NULL,
  3. username int(10) DEFAULT '0' NOT NULL,
  4. ...
  5. number_of_purchases int(10) DEFAULT '0' NOT NULL,
  6. );

Scenario:
Just imagine that the number of purchases a user has made is somehow complicated to calculate and query. That's why we store the number directly in the user table.
We also have a website that produces a lot of reads to this table column, but (unfortunately) its quite seldom that a user purchases something. So using this complicated count query in each website impression would be slow.

What comes handy here is using MySQL 5 triggers.
With triggers we can calculate the number_of_purchases each time a purchase is made by a user and update the users number_of_purchases column.

Example:

SQL:
  1.  
  2. DELIMITER //
  3. CREATE TRIGGER trg_update_user_purchases after UPDATE ON purchases
  4.  FOR each row BEGIN
  5. call count_and_update_user_purchases(new.user_id);
  6.  END;
  7. //
  8. DELIMITER ;
  9.  
  10. DELIMITER //
  11. CREATE TRIGGER trg_insert_user_purchases after INSERT ON purchases
  12.  FOR each row BEGIN
  13. call count_and_update_user_purchases(new.user_id);
  14.  END;
  15. //
  16. DELIMITER ;
  17.  
  18. DELIMITER //
  19. CREATE TRIGGER trg_delete_user_purchases after DELETE ON purchases
  20.  FOR each row BEGIN
  21. call count_and_update_user_purchases(new.user_id);
  22.  END;
  23. //
  24. DELIMITER ;
  25.  

So each time a records in the purchase table is inserted, updated or deleted the user table gets automaticly updated.

Your application gains the performance from selecting a static int field instead of executing complex count queries. Additionally your application is not bloated with unnecessary code to increment and decrement the count column, because the trigger does this automatically for you.

MySQL: Collation matters when using unique indexes

When using a uniqie index on a text field in mysql, the column collation setting is very important. The collation settings of a column does not only affect sorting and comparsion, but also unique indexes. So you can not insert "a" and "A" into a table that has a unique index on a column that has a case-insensitive collation. The mysql manual about collations: "A character set is a set of symbols and encodings. A collation is a set of rules for comparing characters in a character set."

Here is an example:
The column text in table text1 has a case-sensitive collation (_cs suffix), the column in text2 has a case-insensitive collation (_ci suffix).

CODE:
  1.  
  2. CREATE TABLE text1 (
  3.   `text` varchar(50) character set latin1 collate latin1_general_cs NOT NULL default '',
  4.   UNIQUE KEY uniq_text (text)
  5. ) ENGINE=MyISAM
  6. ;
  7. INSERT INTO text1 (text) VALUES ('a');
  8. Query OK, 1 row affected (0.00 sec)
  9.  
  10. INSERT INTO text1 (text) VALUES ('A');
  11. Query OK, 1 row affected (0.00 sec)
  12.  
  13.  
  14. CREATE TABLE text2 (
  15.   `text` varchar(50) character set latin1 collate latin1_general_ci NOT NULL default '',
  16.   UNIQUE KEY uniq_text (text)
  17. ) ENGINE=MyISAM
  18. ;
  19. INSERT INTO text2 (text) VALUES ('a');
  20. Query OK, 1 row affected (0.00 sec)
  21.  
  22. INSERT INTO text2 (text) VALUES ('A');
  23. ERROR 1062 (23000): Duplicate entry 'A' for key 1
  24.  

Constraints are also affected by collation:
The queries on table text1 give different results (a and A), the two queries on table text2 result in the same row twice (a).

CODE:
  1.  
  2. SELECT * FROM text1 WHERE text = 'a';
  3. +------+
  4. | text |
  5. +------+
  6. | a    |
  7. +------+
  8. 1 row in set (0.01 sec)
  9.  
  10. SELECT * FROM text1 WHERE text = 'A';
  11. +------+
  12. | text |
  13. +------+
  14. | A    |
  15. +------+
  16. 1 row in set (0.00 sec)
  17.  
  18. SELECT * FROM text2 WHERE text = 'a';
  19. +------+
  20. | text |
  21. +------+
  22. | a    |
  23. +------+
  24. 1 row in set (0.00 sec)
  25.  
  26. SELECT * FROM text2 WHERE text = 'A';
  27. +------+
  28. | text |
  29. +------+
  30. | a    |
  31. +------+
  32. 1 row in set (0.00 sec)
  33.  

You can display all available collations using

SQL:
  1. SHOW COLLATION;

or

SQL:
  1. SHOW COLLATION LIKE 'latin%';

Dynamic Materialized Views in MySQL

In one of my latest postings I mentioned a way to create semi materialized views in MySQL. The problem was that the solution required a stored procedure for every materialized view. That's very annoying. So I looked for a more general and dynamic approach. The result is another stored procedure with 3 parameters:

  • the name for the source table / view
  • the primary key columns of the source table / view
  • the desired name for the materialized view

Basically this procedure does the following:

  • It drops the materialized view if it already exists
  • It creates the materialized view with the structure and data of the source table
  • It adds a primary key to the newly created materialized view

However the procedure does not create triggers on the table. This could be an improvement for a next version ;)
To call the procedure use:

SQL:
  1. CALL create_mview('big_table', 'id', 'view_m_big_table');

And here's the procedure:

SQL:
  1. DROP PROCEDURE IF EXISTS create_mview;
  2. DELIMITER //
  3. CREATE PROCEDURE create_mview (IN source_table VARCHAR(150), IN primary_key VARCHAR(150), IN target_table  VARCHAR(150))
  4. LANGUAGE SQL
  5. NOT DETERMINISTIC
  6. CONTAINS SQL
  7. SQL SECURITY DEFINER
  8. MODIFIES SQL DATA
  9. BEGIN
  10. -- declare some variables ...
  11. DECLARE m_view_name VARCHAR(150);
  12. DECLARE create_m_view_sql TEXT;
  13. DECLARE drop_m_view_sql TEXT;
  14. DECLARE add_primary_key_sql TEXT;
  15. --
  16. -- create the sql queries to drop the materialized view, create the tmaterialized view
  17. -- and add a primary key to the materialized view.
  18. SET m_view_name:=concat('view_m_', source_table);
  19. SET @drop_m_view_sql:=concat('DROP TABLE IF EXISTS ',target_table);
  20. SET @create_m_view_sql:=concat('CREATE TABLE ', target_table , ' SELECT * FROM ', source_table);
  21. SET @add_primary_key_sql:=concat('ALTER TABLE ', target_table , ' ADD PRIMARY KEY (', primary_key ,')');
  22. --
  23. -- drop the materialized view if it exists
  24. PREPARE stmt1 FROM @drop_m_view_sql;
  25. EXECUTE stmt1;
  26. --
  27. -- create the table
  28. PREPARE stmt2 FROM @create_m_view_sql;
  29. EXECUTE stmt2;
  30. --
  31. -- add the primary key
  32. PREPARE stmt3 FROM @add_primary_key_sql;
  33. EXECUTE stmt3;
  34. --
  35. DEALLOCATE PREPARE stmt1;
  36. DEALLOCATE PREPARE stmt2;
  37. DEALLOCATE PREPARE stmt3;
  38. --
  39. END; //
  40. DELIMITER ;
  41.  

You can download the procedure also.

Checking password strength using md5.rednoize.com and AJAX

With some inspiration from Ben Ramsey i created a little demo for using my md5 database during the signup process on a website.

This example demonstrates the use of the MD5 database at http://md5.rednoize.com and AJAX to check password strength during signup on a website. After supplying a username and a password, a md5 hash of the password is generated using Paul Johnston's md5 javascript library.

The hash of the password (not the password itself) is then sent to http://md5.rednoize.com. If the website returns a result for the given password (hence the hash password combination is stored in the md5 database) it can be regarded as "insecure". Because the md5 hash of the password, and not the password itself is transferred, no sensitive data will be saved at md5.rednoize.com.

For sure not every password that is not stored in the MD5 database can be considered secure. I recommend adding some extra checks (existence of upper and lowercase characters, numbers, special characters and so on) to increase the password strength.

You can see the example in action here: http://md5.rednoize.com/ajax/

Feel free to use the code in this example any play with it ;)
To implement this on your own webserver you would need some kind of proxy script that redirects the AJAX calls from your own server to the md5 database.

Update:

I recommend saving passwords using salted md5 hashes. Salting in short: "When the user sets a password, a short string called the salt is suffixed to the password before encrypting it; the salt is stored along with the encrypted password so that it can be used during verification. Since the salt is different for each user, the attacker can no longer use a single encrypted version of each candidate password. If the salt is long enough, the attacker must repeat the encryption of every guess for each user, and this can only be done after obtaining the encrypted password record for that user."

Here's a little example using salted passwords:

To authenticate users on your website (login) your probably using a SQL statement like this one:
SELECT user_id, username FROM users WHERE passsword = MD5('thepassword');

This is insecure. If someone would know the md5 hash of the password, and the password is weak, it could be "reversed" using the MD5 database.

Use salted passwords to avoid this:
SELECT user_id, username FROM users WHERE passsword = MD5(user_id || 'some_secret_string' || 'thepassword');

Its also possible to store the salt along the password in the database.

Materialized Views in MySQL

Update: Dynamic Materialized Views
Please read this article also. Its an improved versio of the technique described below.

Today i found a workaround for a feature that was realy missing in MySQL.
Let's say you got a really complicated query that requires a lot of cpu and time to complete, but the resulting data is quite static. That's the perfect opportunity to use a "Materialized View". The Bad Thing™ is, mysql does not support what's usually known as materialized views. But today i found a nice workaround for MySQL 5.
But first let's clarify what a materialized view is. I found a good explanation here:
"A materialized view is a stored summary containing precomputes results (originating from an SQL select statement).
As the data is precomputed, materialized views allow for (seemingly) faster dataware query answers."

Okay, so a materialized view saves the result of a query somewhere (e.g. into another table) else.

In my case i had to calculate a rating of users based on votes from other users (Website of http://www.hiphop-battles.com btw) . I'm doing this with some nested views (also a new feature in MySQL5). The whole calculation and query took 0.5 seconds on my athlon xp 3200 machine. Thats way to much to run this query on each page impression.

The trick is a new feature that was introduced in MySQL 5. From the MySQL Manual:
"In MySQL 5.0, you can create one table from another by adding a SELECT statement at the end of the CREATE TABLE statement:
CREATE TABLE new_tbl SELECT * FROM orig_tbl;"

So its possible to create a new table with structure based on an already existing table. The data of the table gets copied also.
To create a table of a very processing intensive view we run the following query:
CREATE TABLE view_m_demo SELECT * FROM view_demo;

To make it little bit better we put this code into a procedure (also a new feature in MySQL5):
DELIMITER //
CREATE PROCEDURE updateDemoView ()
BEGIN
DROP TABLE IF EXISTS view_m_demo;
CREATE TABLE view_m_demo SELECT * FROM view_demo;
END; //
DELIMITER ;

Now we can call this procedure using a trigger:
delimiter //
CREATE TRIGGER T_updateDemoView AFTER INSERT ON some_involved_table
FOR EACH ROW BEGIN
/* calls the procedure to update out materialized view on each insert */
CALL updateDemoView ();
END //
delimiter ;

The same can be done with an "AFTER UPDATE" trigger. dont use database triggers if the data in the original tables get updated frequently.
Another option in this case is to update the view time triggerd via cron.
To update the materialized view every minute use:
*/1 * * * * echo "CALL updateDemoView ();" | mysql -u user --password=pass database

I hope this helps somebody as much as it helps my projects website loading time ;)
before: 0,48 sec
after: 0,003 sec