Home


Mysql insert binary data from TAB files

Motivation We are working with a lot of data for the microbeAtlas.org project and we keep part of it (that needs to be text searchable) in a Mysql database. Alongside this data, we also would like to keep JSON objects that are large to some extend, and we compress them and store them in a binary form.

Preparing data in TAB files Turns out the fastest way to populate a Mysql table is to load the data from a TAB file. Storing numbers and text in a text file is simple, however how do you store binary data and load it into Mysql? Turns out you need to use something like hexlify, so to represent the binary data in hex format, which can be stored as text. Let's look at how you would do that in Python:

import zlib
import binascii

# compression and storage
my_data = "Some data I would like to compress and store as binary in hex format."
my_data_compressed = zlib.compress(my_data.encode())
my_data_compressed_hex = binascii.hexlify(my_data_compressed)

# write the data to the file
f = open("data.tab", "wt")
f.write("column1\tcolumn2\t%s" % my_data_compressed_hex)
Ekola, you would have now the data in a TAB delimited file (one line per row).

Loading (storing) data into Mysql tables
How to store and load into Mysql? You would create a sql file in this direction:

DROP TABLE IF EXISTS my_table;
CREATE TABLE `my_table` (
	  `column_1` varchar(255) NOT NULL,
	  `column_2` text,
	  `column_3` LONGBLOB
    ) ENGINE=MyISAM DEFAULT CHARSET=utf8;

LOAD DATA LOCAL INFILE "data.tab" INTO TABLE my_table FIELDS TERMINATED BY '\t' \
(column_1, column_2, @column_3_val) SET column_3 = UNHEX(@column_3_val);
Notice the UNHEX for the column_3 field, since we need to store in binary and not in hex strings. Voila. Thanks for reading! and if it was helpful for you, leave a comment below.

References

Stackoverflow: Import hex/binary data into mysql