Chapter 5 Making Tables and Importing Data into MySQL

Chapter 5

Making Tables and Importing Data into MySQL

What you will learn:

How to make new tables and import data in MySQL using the Navicat and

Sequel Pro front ends, and the MySQL LOAD DATA INFILE command.

For this tutorial, we assume you have created a connection to MySQL using either

Navicat or Sequel Pro. See the tutorial Exploring Navicat and Sequel Pro if you

need to do this. For a general discussion on MySQL table creation and import, see

Chapter 5 of The Data Journalist.

We’ll start with the process of making a table to which data will be added later.

To begin, in Navicat, right click on the Tables item within the listing for your

database, in the table of contents area, and choose New Table from the menu.

This will open the table creation dialogue. You can also click on the New Table icon.

On this screen we can enter the names for the fields in the table, set the datatypes, set

options such as the number of decimal places in decimal and floating type fields,

whether a field can contain null values, and the default value of a field, and set the

primary key for the table, if there is to be one.

We’ll work with the demonstration data on political donations to which you were

introduced in Chapter 5. Delimited text files containing the three demonstration

tables are available for download on this book’s companion website.

First, we’ll create the Donor ID field using an integer data type, make it the primary

key by clicking on the Primary Key icon, and make it an auto increment field by

clicking on the auto increment check box. This means the automatically be populated

by a number, starting from 1 and counting upward, as each new record is added.

If you wish the field to be the primary key for the table, click on the Primary Key

icon.

To add another field, click on the Add Field icon.

In the image below, we’ve added the remainder of the fields, each with a data type.

Several of the text fields use the varchar data type as they will have different lengths

and varchar fields take up only space taken by the data in each row. The maximum

length is left at the default value of 255.

The Donation_Date field uses the Date type as we want to enter actual dates. The

Donor_Age field is set as an integer field as ages will always be whole numbers, and

the Donation_Amount is set as a decimal field, to ensure the greatest precision.

Please see Chapter 5 for a full discussion of data types.

When you enter fields in this way, the front-end writes the necessary SQL table

creation statement. You can see the statement in the SQL Preview tab. This is a great

way to learn the syntax.

Of course, at this point, we haven’t given our table a name, so the create table syntax

is using the generic NewTable name. However, we can give the table a name, and

create it at the same time via the Save dialogue.

When you click Save, you will be prompted for a name.

When you click OK, the table-creation statement will be run in the background, and

the new table will appear in the table of contents.

If you double click on the table name, you will open it for editing. You can type data

directly into the table, or use the LOAD DATA INFILE SQL command to import

data directly, as we’ll explore a little later in this tutorial.

The process of creating a table in Sequel Pro is similar.

In Sequel Pro, you begin the table creation process by clicking on the small plus sign

in the lower left of the main user interface.

Here, you can give your table a name and choose the character encoding and collation

for the table. We've left the defaults. The encoding refers to which character set is

being used for characters, in this case Latin1, which handles western languages well.

The collation refers to the way that characters are compared to one another when

comparison operators such as =, > and <> are used. The "ci" collations are case

insensitive, which means that a letter z will be considered equal to a letter Z. There are

also "bin" collations that are character sensitive and would consider z and Z to be

different. Generally speaking for queries by journalists, you'll want the more

permissive collation.

Table type refers to the database engine, in this case the default MySQL InnoDB.

There is usually little reason to change this, but you'll find a complete explanation of

the differences between different database engines in the MySQL documentation.

When we click Add, a screen for entering the table structure will appear.

Notice that Sequel Pro automatically adds a primary key to the table, with the name

id. It is an auto-incrementing, unsigned (no negative numbers) integer. We can modify

this later to take the name used in our data, Donor_ID.

To add a new field, click in the + sign in the lower left of the table design area. In the

image above, that is the higher of the two + signs. The one at the very bottom is for

adding indexes.

Clicking the + sign adds a new line where we can add another field.

Clicking on the small up and down arrow icon exposes a drop-down list of data types.

We'll choose Varchar, our standard type for character fields of up to 255 characters.

We'll choose a length of 255, remembering that MySQL will only store the actual

number of characters used in each row, by that field.

Here is the table, filled out with the remaining fields and data types.

An argument could be made for making the Donor_Gender field a CHAR field with a

length of 1. This would prevent the entry of anything more than one letter, which

could help prevent mistakes.

When you are satisfied with the table structure, click on the small, curved arrow. This

will run the table creation query that Sequel Pro has quietly been writing in the

background. You can now switch to the Content tab, to see the new, empty table.

To begin adding data, click on the small plus sign at the bottom of the table area (the

one further to the right). You can now type in your rows of data. Or, if you prefer,

you can use the import wizard or LOAD DATA INFILE to add external data.

Using LOAD DATA INFILE

LOAD DATA INFILE is a fast and powerful way to get delimited text data into a

MySQL table. Its advantages are speed and, on the surface at least, simplicity.

The file we are going to import is called Donations.csv.

As with any such import, you should look at the text file in an ordinary text editor

such as Notepad or TextEdit, so you can become familiar with the data structure. It is

essential that the data structure of the imported data and the data structure of the

table into which it are imported are identical. The number of fields must be the same,

and the type of data to be imported into each field in the database table must be

compatible with the data type of that field. If, as an example, a column in the CSV has

alphabetical characters in it and the field is an integer field, the data either won’t be

imported, or the entire import will fail, depending on the SQL mode set for the

database.

If you would prefer to be given warnings rather than have the import fail, you should

set the SQL mode to ANSI. In a front end, or at the SQL prompt when using the

SQL command line program, enter the following:

SET sql_mode = ‘ANSI’

The complete syntax for the LOAD DATA INFILE command is available in the

MySQL documentation, but a typical statement will have these clauses:

LOAD DATA INFILE ‘complete path to text file’

INTO TABLE TableName

FIELDS TERMINATED BY ‘,’

OPTIONALLY ENCLOSED BY ‘”’

LINES TERMINATED BY ‘\n’

IGNORE n LINES

The character you enter in the FIELDS TERMINATED BY line is the delimiter (in

the above column a comma) and the character(s) entered in the LINES

TERMINATED BY line will depend on the operating system that created the

delimited text file. Files created on Macs may have only the ‘\n’ or ‘\r\ characters,

while those created by Windows machines may end with the combination ‘\r\n’

Looking at the text file in an editor that shows hidden characters can help you

determine in the line-ending character. If you use the wrong one, the worst that will

happen is that MySQL will try to load the entire file into one row. Try one of the

other combinations if that happens.

Here is what the command would look like for the Donations data.

LOAD DATA INFILE ‘c:/mydata/Donations.csv’

INTO TABLE Donations

FIELDS TERMINATED BY ‘,’

OPTIONALLY ENCLOSED BY ‘”’

LINES TERMINATED BY ‘\r\n’

IGNORE 1 LINES

If there are problems with the import, these will be reported as warnings, if you

turned off strict mode, or the first error will be reported as an error, if strict mode is

on (and the import will stop). Troubleshooting issues with LOAD DATA INFILE is

one of the reasons why many users prefer to use the import utility available in front

ends such as Navicat or Sequel Pro.

These utilities behave in much the same way as the text import wizard in Microsoft

Access. They automatically examine the structure of the text file to be imported, then

create the table and import the data in one operation. If the table has already been

created, they can import directly it.

For this exercise, we will presume a new table needs to be created. We’ll call it

Donations2 to differentiate it from the table we created by hand, but if you are using

the wizard interface to create table for the first time, feel free to just call it Donations.

In Navicat, click on the large Table icon at the top of the screen, then choose Import

Wizard (or right click on Tables in the table of contents, and choose Import Wizard),

to bring up the first step of the wizard.

You will see the file formats that can be imported by Navicat Essentials.

For delimited text, you have the choice of Text file, which allows for importing any

delimited text file, or CSV, which assumes a comma-delimited text file.

We will choose Text file, but the choices are the same for the CSV file, except that the

file must be a csv.

Click Next to bring up step 2.

Enter the name of the file you wish to enter, and click Next.

The record delimiter is the character that ends each record, and this will either be the

carriage return/new line combination common on files created in Windows, a new

line character, or a carriage return character.

The field delimiter is the character that delimits each field, in our case a comma, but

also possibly a tab character, space or other symbol. The text qualifier is a character,

usually a single or double quotation mark, that may appear around complete field

entries. This ensures that any delimiter characters that are contained within the field

itself won’t be treated as field separators. The use of a text qualifier is common in csv

files because it is common to have commas contained within text or numbers.

These are the completed options for the donations data.

Click Next for more options.

If your examination of the text file showed that it had field names in the first two,

leave the top value as is. Otherwise, change to the appropriate row, or leave it blank if

there are no field names present. The second box allows you to indicate where the

first row of actual data appears.

The remaining options allow you to specify how data is arranged in the text file, such

as the order of elements in dates, the symbol used between date elements, and the

symbol used for decimals.

When you are finished making your choices, click Next.

Here, we choose whether to import the data into an existing table, or a new one.

We’ve renamed the Target Table as Donations2

In our case, we can see that Navicat has made every field a character field. We can

now change the fields that should be of other data types.

Click Next.

On the next screen, you will typically leave the options as is, though in the Advanced

options you can change what is inserted if the import wizard encounters NULL

(empty) values, and whether the import will continue if there is an error. Generally, it

is better to continue on errors, then examine any warnings produced to see if critical

data is missing.

Click OK, then Next, and finally Start to begin the import process.

The final screen reports on the import, and whether there were any errors. If there are

errors, they are recorded in a log file, so you can see what went wrong and make

changes to the import settings or the source text file before trying again.

You’re done. The table is imported into MySQL and you can query it.

Using the Sequel Pro import wizard

The import process in Sequel Pro is similar to that in Navicat.

Begin by connecting to the MySQL Server and selecting a database. If you are unsure

how, see the tutorial Exploring Navicat and Sequel Pro

To begin your file import, Choose Import from the File menu.

On the screen that opens, navigate to the folder containing the file you wish to

import, in this case Donations.csv On the same page, you set the various attributes of

the file you wish to import, including whether the fieldnames are in the first line, the

field and line terminators, whether there is a text qualifier (Fields enclosed by), and

how any quotation marks within fields are escaped.

In our case, the file is a CSV file, which means fields are terminated by commas. The

file was originally created on a Windows machine, so the file uses the windows \r\n

terminator for lines, the field qualifier is a double quotation mark, and quotation

marks can be escaped either by quotation marks or the \ slash character.

Click on Options to toggle these settings from visible to hidden and vice versa.

When you are done, click Open. Sequel Pro will provide its best guess at the data type

for each field.

As you can see, it has interpreted the Donor_ID as an integer, as well as the donor

age, but is suggesting that the Donation_Date and the Donation_Amount both be

varchar text fields when they ought to be date and decimal fields respectively. The

remainder of the fields have been correctly assessed as text fields and the Varchar data

type suggested. Again, this is the most flexible data type for fields containing up to

255 characters of text data.

We'll change the Donation_Date field to a date field, from the dropdown list, and

because Sequel Pro doesn't offer decimal as a data type in the drop down, we'll double

click on the data type beside the Donation_Amount field and replace

VARCHAR(255) with DECIMAL(20,2), or a decimal field with a length of 20 and

two decimal places. You can manually change any data type this way.

Be sure also to give the table a name as we have here.

Click Import when you are done setting the data types for all of the fields and naming

the table.

Barring fatal errors, your data will import as seen below.

You are ready to start querying the data, or to import additional tables. If there are

already tables in the database, Sequel Pro will give you the option of adding the new

data to an existing table, or creating a new one with a new name.

It will be imported into a new table. Click Next to see how Navicat has interpreted

the data and the choices it is suggesting for the table structure.

As you enter each value, hit the TAB key to move right to the next field. Those fields

that were set as not allowing NULL values will require a value to be entered.

When you have finished adding the first record, you can click on the checkmark at the

bottom of the screen to commit the record to the table. Clicking on the + sign allows

you to add a new record. Continue until you have finished adding all of the records.

If, at any point, you realize that you need to add additional fields, right click on the

table name and choose Design Table. Let’s say we wanted to add fields for the years

of the start and end of each president’s term(s) in office. Click on Add Field and you

can add a new field as you did when you first created the table.

Save the table, and you can add the additional information.

Continue until you have completed adding all of your data.

Creating tables and adding data with the import wizard

If you are importing external data from a CSV file