Full Text Search With Sqlite SQLite

Mar 6th, 2020 - written by Kimserey with .

Full text search allows users to search records in data storage using properties of the objects. For example a full text search on a blog engine could allow a user to search for a post based on the title or content of the post or categories. The text provided could be a word, or a sentence or part of a sentence and the result would be a list of posts ranked by best match. In today’s post we will look at how we can take implement fts using SQLite built in fts5 module.

Create Index Table

To demonstrate how the Full Text Search table works, we start by creating a user table:

1
2
3
4
5
6
CREATE TABLE user (
    id INTEGER PRIMARY KEY,
    username TEXT NOT NULL UNIQUE,
    email TEXT NOT NULL UNIQUE,
    short_description TEXT
)

In our example, we will be enabling full text search on the username and short_description. To do so we make use of SQLite FTS5 module with the syntax USING fts5():

1
2
3
4
5
6
7
CREATE VIRTUAL TABLE user_fts USING fts5(
    username, 
    short_description, 
    email UNINDEXED, 
    content='user', 
    content_rowid='id' 
)

content will specify a content table in order to tie the index data with existing physical data from another table, here our user table. We also specify the content_rowid to match with the user.id. This query will then create the following tables:

1
2
3
4
5
6
7
8
9
10
11
CREATE VIRTUAL TABLE user_fts USING fts5(
    username,
    short_description,
    email UNINDEXED,
    content='user',
    content_rowid='id'
)
CREATE TABLE 'user_fts_config'(k PRIMARY KEY, v) WITHOUT ROWID
CREATE TABLE 'user_fts_data'(id INTEGER PRIMARY KEY, block BLOB)
CREATE TABLE 'user_fts_docsize'(id INTEGER PRIMARY KEY, sz BLOB)
CREATE TABLE 'user_fts_idx'(segid, term, pgno, PRIMARY KEY(segid, term)) WITHOUT ROWID

We added username and short_description for indexing and added email as unindexed, this allows us to retrieve the data from user_fts without enabling search on the email row.

_config, _data, _docsize and _idx are created automatically by the fts module, in order to enable indexing on the user table. When using content for storing data of index, we are responsible for maintaining the indexed data up to date when the content table is changed.

For example if we were to add the following users:

1
2
3
4
5
6
7
8
9
INSERT INTO user (
    username, 
    email, 
    short_description
) VALUES 
    ('user_1', '[email protected]', 'I am a senior designer with experience in Javascript'),
    ('user_2', '[email protected]', 'I am a web developer working on Angular'),
    ('user_3', '[email protected]', 'I am a backend junior developer and have worked in Finance'),
    ('user_4', '[email protected]', 'I am a senior engineer, expert in concurrent backend system')

We would be able to query the content of the user_fts:

1
2
3
4
5
6
7
8
9
SELECT * FROM user_fts
+----------+-------------------------------------------------------------+------------------+
| username | short_description                                           | email            |
+----------+-------------------------------------------------------------+------------------+
| user_1   | I am a senior designer with experience in Javascript        | user_1@gmail.com |
| user_2   | I am a web developer working on Angular                     | user_2@gmail.com |
| user_3   | I am a backend junior developer and have worked in Finance  | user_3@gmail.com |
| user_4   | I am a senior engineer, expert in concurrent backend system | user_4@gmail.com |
+----------+-------------------------------------------------------------+------------------+

But if we were to execute a search, we will have no result:

1
2
SELECT * FROM user_fts WHERE user_fts MATCH 'I' ORDER BY rank;
0 rows in set

The reason for that is that even though the content of the table indexed has the data of user, the indexed data aren’t populated. In order to maintain the consistency between data and indexes, we need to create triggers.

Create Triggers

We create three triggers, one for insert, one for delete and one for update:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
CREATE TRIGGER user_ai AFTER INSERT ON user
    BEGIN
        INSERT INTO user_fts (rowid, username, short_description)
        VALUES (new.id, new.username, new.short_description);
    END;

CREATE TRIGGER user_ad AFTER DELETE ON user
    BEGIN
        INSERT INTO user_fts (user_fts, rowid, username, short_description)
        VALUES ('delete', old.id, old.username, old.short_description);
    END;

CREATE TRIGGER user_au AFTER UPDATE ON user
    BEGIN
        INSERT INTO user_fts (user_fts, rowid, username, short_description)
        VALUES ('delete', old.id, old.username, old.short_description);
        INSERT INTO user_fts (rowid, username, short_description)
        VALUES (new.id, new.username, new.short_description);
    END;

Deleting is done by adding a special value delete in the special column user_fts (the name of the fts table). Here we only make changes on the indexed columns as those are the only one that needs to be kept in sync.

If we create now our users, we’ll see that we will be able to query them:

1
2
3
4
5
6
7
8
9
SELECT * FROM user_fts WHERE user_fts MATCH 'I' ORDER BY rank;
+----------+-------------------------------------------------------------+------------------+
| username | short_description                                           | email            |
+----------+-------------------------------------------------------------+------------------+
| user_2   | I am a web developer working on Angular                     | user_2@gmail.com |
| user_1   | I am a senior designer with experience in Javascript        | user_1@gmail.com |
| user_4   | I am a senior engineer, expert in concurrent backend system | user_4@gmail.com |
| user_3   | I am a backend junior developer and have worked in Finance  | user_3@gmail.com |
+----------+-------------------------------------------------------------+------------------+

Now that we are all setup to index our data, we can start looking at how we can query.

Query Index

Using full text search we are able to order by a special field rank. This provides a ranking of the result from the MATCH provided. For example:

1
2
3
4
5
6
7
8
SELECT * FROM user_fts WHERE user_fts MATCH 'senior OR backend' ORDER BY rank;
+----------+-------------------------------------------------------------+------------------+
| username | short_description                                           | email            |
+----------+-------------------------------------------------------------+------------------+
| user_4   | I am a senior engineer, expert in concurrent backend system | user_4@gmail.com |
| user_1   | I am a senior designer with experience in Javascript        | user_1@gmail.com |
| user_3   | I am a backend junior developer and have worked in Finance  | user_3@gmail.com |
+----------+-------------------------------------------------------------+------------------+

will get the users that have a short description containing senior or backend where senior + backend is ranked higher.

The power of the full text search can be seen here by providing tokens, we are able to retrieve indexed users:

1
2
3
4
5
6
SELECT * FROM user_fts WHERE user_fts MATCH 'backend senior' ORDER BY rank;
+----------+-------------------------------------------------------------+------------------+
| username | short_description                                           | email            |
+----------+-------------------------------------------------------------+------------------+
| user_4   | I am a senior engineer, expert in concurrent backend system | user_4@gmail.com |
+----------+-------------------------------------------------------------+------------------+

by simply asking for backend senior, SQLite is able to make sense of our request which was to retrieve a user with a short description vaguely matching backend senior. Here’s another example:

1
2
3
4
5
6
SELECT * FROM user_fts WHERE user_fts MATCH 'senior javascript' ORDER BY rank;
+----------+------------------------------------------------------+------------------+
| username | short_description                                    | email            |
+----------+------------------------------------------------------+------------------+
| user_1   | I am a senior designer with experience in Javascript | user_1@gmail.com |
+----------+------------------------------------------------------+------------------+

And we can see how our triggers work, if we update the decriptions of all our users:

1
UPDATE user SET short_description='Removed'

We will see that the descriptions will no longer match the previous tokens:

1
2
3
SELECT * FROM user_fts WHERE user_fts MATCH 'senior OR backend' ORDER BY rank;
0 rows in set
Time: 0.001s

And we will be able to query using the updated short description.

1
2
3
4
5
6
7
8
9
SELECT * FROM user_fts WHERE user_fts MATCH 'removed' ORDER BY rank;
+----------+-------------------+------------------+
| username | short_description | email            |
+----------+-------------------+------------------+
| user_1   | Removed           | user_1@gmail.com |
| user_2   | Removed           | user_2@gmail.com |
| user_3   | Removed           | user_3@gmail.com |
| user_4   | Removed           | user_4@gmail.com |
+----------+-------------------+------------------+

In the same way, we can query using the boolean operations OR, AND and NOT. There are much more we can do in term of query with fts but I’ll leave it for you to explore!

Conclusion

Today we looked into Full Text Search using the SQLite fts module. We started by looking into how to create an index table, then we moved on to see how we could create triggers and why it was necessary to create triggers, and lastly we completed the post by exploring queries on the indexes and how the triggers affected the queries. I hope you liked this post and I see you on the next one!

External Sources

Designed, built and maintained by Kimserey Lam.