Search This Blog

2023/04/17

MySQL: FULL TEXT Search

 Full-text search in MySQL is a powerful feature that allows you to search

for text within a column or set of columns in a table. Unlike simple
pattern matching with the LIKE operator, full-text search allows you to
search for words or phrases within a text string, and rank the results
based on relevance.

To use full-text search in MySQL, you must first create a full-text
index on the column or columns you want to search. This index
is used to speed up searches and improve performance.
Once the index is created, you can use the MATCH() AGAINST()
syntax to perform full-text searches.

SELECT * FROM mytable WHERE MATCH(column1, column2, ...) AGAINST ('search term');

In this example, mytable is the name of the table you want to search,
column1 and column2 are the columns you want to search within,
and search term is the text string you want to search for.

Full-text search is particularly useful for applications that involve
searching large amounts of text data, such
as content management systems, e-commerce sites, and online forums.

Practical Insignt:

CREATE TABLE products (
product_id BIGINT NOT NULL AUTO_INCREMENT PRIMARY KEY,
product_name VARCHAR(50),
retail_price DOUBLE,
FULLTEXT(product_name)
) ENGINE = InnoDB;

Insert some records:
INSERT INTO products (product_name, retail_price) VALUES ('LEMON STRAINER WITH 500ML BOTTLE', '3.45');
INSERT INTO products (product_name, retail_price) VALUES ('HARDENED PLASTIC SMART HAIR TRIMMER', '76.35');
INSERT INTO products (product_name, retail_price) VALUES ('SIMCARD ENABLED 4G WIFI ROUTER', '99.95');
INSERT INTO products (product_name, retail_price) VALUES ('GAMING MOUSE WITH SMART LED', '45.55');
INSERT INTO products (product_name, retail_price) VALUES ('JUICE - 500ML LEMON', '2.45');
INSERT INTO products (product_name, retail_price) VALUES ('5G SMART PHONE WITH NANO SIM', '369.75');
INSERT INTO products (product_name, retail_price) VALUES ('LED SMART PLASTIC WATCH', '89.85');
INSERT INTO products (product_name, retail_price) VALUES ('WI-FI ROUTER WITH SIM SLOT', '75.95');
INSERT INTO products (product_name, retail_price) VALUES ('LED TORCH - HARDENED PLASTIC', '25.62');
INSERT INTO products (product_name, retail_price) VALUES ('NANO SIM CARD CUTTER', '5.65');
INSERT INTO products (product_name, retail_price) VALUES ('DY MODEL 2T AIR CONTROL VALVE', '5.65');
INSERT INTO products (product_name, retail_price) VALUES ('SIMCARD ENABLED 4G WIFI', '75.00');
INSERT INTO products (product_name, retail_price) VALUES ('ROUTER', '85.00');
INSERT INTO products (product_name, retail_price) VALUES ('5G PHONE SMART WITH NANO SIM', '369.75');
Types:
There are three types of Full-Text searches.

1) Natural language full-text searches: Interprets text as free text in natural human language.

SELECT product_id,product_name,retail_price
FROM products
WHERE MATCH(product_name) AGAINST('WIFI ROUTER' IN NATURAL LANGUAGE MODE);


Output:
+------------+--------------------------------+--------------+
| product_id | product_name | retail_price |
+------------+--------------------------------+--------------+
| 3 | SIMCARD ENABLED 4G WIFI ROUTER | 99.95 |
| 12 | SIMCARD ENABLED 4G WIFI | 75 |
| 8 | WI-FI ROUTER WITH SIM SLOT | 75.95 |
| 13 | ROUTER | 85 |
+------------+--------------------------------+--------------+
Here 12th row is having value word WIFI not complete word 'WIFI ROUTER' while 13th row has word ROUTER
but not whole word 'WIFI ROUTER'



2) Boolean full-text searches: For advanced queries based on complex Boolean operators such as +.
In Boolean Full-Text mode, you can put certain characters in the beginning or at the end of a
keyword when running a search. Use the + and the - operators to indicate the words you want
to include or exclude in the results.

USING + operator:

SELECT product_name,retail_price
FROM products
WHERE MATCH(product_name) AGAINST('LED +PLASTIC' IN BOOLEAN MODE);

Output:
+-------------------------------------+--------------+
| product_name | retail_price |
+-------------------------------------+--------------+
| LED SMART PLASTIC WATCH | 89.85 |
| LED TORCH - HARDENED PLASTIC | 25.62 |
| HARDENED PLASTIC SMART HAIR TRIMMER | 76.35 |
+-------------------------------------+--------------+

word 'PLASTIC' is required to be present while word 'LED' may or may not

USING - operator:

SELECT product_name,retail_price FROM products
WHERE MATCH(product_name) AGAINST('LED -PLASTIC' IN BOOLEAN MODE);

Output:
+-----------------------------+--------------+
| product_name | retail_price |
+-----------------------------+--------------+
| GAMING MOUSE WITH SMART LED | 45.55 |
+-----------------------------+--------------+

return results that match the keyword LED but do not contain the word plastic

USING DOUBLE QUOTE:

SELECT product_name,retail_price
FROM products
WHERE MATCH(product_name) AGAINST('"SMART PHONE"' IN BOOLEAN MODE);

Output:
+------------------------------+--------------+
| product_name | retail_price |
+------------------------------+--------------+
| 5G SMART PHONE WITH NANO SIM | 369.75 |
+------------------------------+--------------+

you can enclose your keyword(s) in double-quotes to return only the rows
that literally match the words as you have typed them in the same order.

You can look at our insert their is one record with containg TEXT PHONE SMART
which does not come in our result set
3) Query expansion full-text searches: Based on compound and extended searches.

You can use a full-text search with a QUERY EXPANSION statement only
if the search phrase is too short. This adds relevant rows to the results.

SELECT product_name,retail_price
FROM products
WHERE MATCH(product_name) AGAINST('WIFI ROUTER' IN NATURAL LANGUAGE MODE);

+--------------------------------+--------------+
| product_name | retail_price |
+--------------------------------+--------------+
| SIMCARD ENABLED 4G WIFI ROUTER | 99.95 |
| SIMCARD ENABLED 4G WIFI | 75 |
| WI-FI ROUTER WITH SIM SLOT | 75.95 |
| ROUTER | 85 |
+--------------------------------+--------------+
QUERY EXPANSION runs search twice.In first part, search against the given keyword such as WIFI ROUTER, and once it retrieves high relevant rows,
then conducts a second search based on the values of the top-ranked rows returned during the first search.



SELECT product_name,retail_price
FROM products
WHERE MATCH(product_name) AGAINST('WIFI ROUTER' WITH QUERY EXPANSION);

Output:
+--------------------------------+--------------+
| product_name | retail_price |
+--------------------------------+--------------+
| SIMCARD ENABLED 4G WIFI ROUTER | 99.95 |
| SIMCARD ENABLED 4G WIFI | 75 |
| WI-FI ROUTER WITH SIM SLOT | 75.95 |
| ROUTER | 85 |
| 5G SMART PHONE WITH NANO SIM | 369.75 |
| NANO SIM CARD CUTTER | 5.65 |
| 5G PHONE SMART WITH NANO SIM | 369.75 |
+--------------------------------+--------------+

Here we got additional rows than previous one.rows after 4th row are searched based on result of previous query.
5,6 7 rows are somehow related to search term 'WIFI ROUTER'.In this 3 record 'SIM' word is present from 1st 3 records.

Relevance Ranking:
Relevance is the measure of accuracy used by the MySQL server to sort returned
rows based on their importance and this is highly determined by the total number
of words that match a search term in a record as well as their order.

SELECT product_id,product_name,retail_price,
MATCH(product_name) AGAINST ('LED SMART WATCH' IN NATURAL LANGUAGE MODE) AS relevance_score
FROM products
WHERE MATCH(product_name) AGAINST('LED SMART WATCH' IN NATURAL LANGUAGE MODE);

Output:
+------------+-------------------------------------+--------------+---------------------+
| product_id | product_name | retail_price | relevance_score |
+------------+-------------------------------------+--------------+---------------------+
| 7 | LED SMART PLASTIC WATCH | 89.85 | 1.961129903793335 |
| 4 | GAMING MOUSE WITH SMART LED | 45.55 | 0.6475204229354858 |
| 9 | LED TORCH - HARDENED PLASTIC | 25.62 | 0.4475700855255127 |
| 2 | HARDENED PLASTIC SMART HAIR TRIMMER | 76.35 | 0.19995030760765076 |
| 6 | 5G SMART PHONE WITH NANO SIM | 369.75 | 0.19995030760765076 |
| 14 | 5G PHONE SMART WITH NANO SIM | 369.75 | 0.19995030760765076 |
+------------+-------------------------------------+--------------+---------------------+

Here first row got highest ranking as all search keywords almost in the same order.

TWO WORD SEARCH :
MySQL server may not return results based on keyword terms that are shorter than 3

SELECT product_name,retail_price
FROM products
WHERE MATCH(product_name)
AGAINST('DY' IN NATURAL LANGUAGE MODE);

Output:
Output is empty no row returned.

Minimuim word allowed in Full text search controlled by InnoDB variable below

SHOW VARIABLES LIKE '%innodb_ft_min_token_size%';
Output:
+--------------------------+-------+
| Variable_name | Value |
+--------------------------+-------+
| innodb_ft_min_token_size | 3 |
+--------------------------+-------+

To change this variable value

Go to /etc/mysql/my.cnf.Edit innodb_ft_min_token_size constant as below

innodb_ft_min_token_size = 2

after this restart mysql service
sudo systemctl restart mysql-server

LIST FULL TEXT INDEXES:

SELECT TABLE_SCHEMA,TABLE_NAME,INDEX_NAME
FROM information_schema.statistics
WHERE
index_type LIKE 'FULLTEXT%';

+--------------+------------+--------------+
| TABLE_SCHEMA | TABLE_NAME | INDEX_NAME |
+--------------+------------+--------------+
| play | products | product_name |
+--------------+------------+--------------+


Rebuilding Full Text Index: DROP FIRST & then recreate.
ALTER TABLE products DROP INDEX product_name;
CREATE FULLTEXT INDEX product_name ON products(product_name);

Reference:
https://www.vultr.com/docs/how-to-implement-mysql-fulltext-search/?psafe_param=1&utm_source=performance-max-apac&utm_medium=paidmedia&obility_id=16876059738&utm_adgroup=&utm_campaign=&utm_term=&utm_content=&gclid=CjwKCAjw3POhBhBQEiwAqTCuBio6Uw1H2VUsY7T_gelCmTCWbNpVzOK2kLQVwfxDvrR5ZN_3Y4B8JhoCY-gQAvD_BwE

No comments:

Post a Comment