본문 바로가기
Oracle/운영

[23ai] new feature 벡터 타입과 ai벡터검색 - 유사성검색 (3)

by 취미툰 2024. 11. 12.
반응형

이전글과 이어집니다.

https://bae9086.tistory.com/571

 

이전 글에서는 pdf 파일을 chunk로 테이블에 insert하는 것까지 완료하였습니다.

이 데이터를 통해 유사성 검색을 테스트 해보도록 하겠습니다.

 

검색할 텍스트 : different methods of backup and recovery

임베딩 모델 : ALL_MINILM_L12_V2

 

SQLPLUS에서 수행하는 방법과 디비버(client tool)에서 수행하는 방법으로 테스트해보았습니다.

두개다 방식은 동일하며,

1) 임베딩모델을 이용하여 쿼리 벡터를 먼저 생성한 후

2)쿼리 벡터를 이용하여 유사한 Chunk를  테이블에서 검색합니다.

 

0)

임베딩 모델 확인 

ORA-40284: 모델이 존재하지 않습니다 발생시에는 DROP 후 재생성합니다.

--기존 모델이 있다면 삭제                                   
EXEC DBMS_VECTOR.DROP_ONNX_MODEL(model_name => 'ALL_MINILM_L12_V2', force => true);                                  

--생성
BEGIN
   DBMS_VECTOR.LOAD_ONNX_MODEL(
        directory => 'VEC_DUMP',
        file_name => 'all_MiniLM_L12_v2.onnx',
        model_name => 'ALL_MINILM_L12_V2');
END;
/

--확인
SELECT MODEL_NAME, MINING_FUNCTION, ALGORITHM,
ALGORITHM_TYPE, MODEL_SIZE
FROM user_mining_models
WHERE model_name = 'ALL_MINILM_L12_V2'
ORDER BY MODEL_NAME;
MODEL_NAME       |MINING_FUNCTION|ALGORITHM|ALGORITHM_TYPE|MODEL_SIZE|
-----------------+---------------+---------+--------------+----------+
ALL_MINILM_L12_V2|EMBEDDING      |ONNX     |NATIVE        | 133322334|

 

1)SQLPLUS에서 확인

 

--PDB 접속
$ sqlplus system@localhost:1521/FREEPDB1

SQL> ACCEPT text_input CHAR PROMPT 'Enter text: '
Enter text: different methods of backup and recovery
SQL> VARIABLE text_variable VARCHAR2(1000)
SQL> VARIABLE query_vector VECTOR
SQL> BEGIN
 :text_variable := '&text_input';
 SELECT vector_embedding(ALL_MINILM_L12_V2 using :text_variable as data)
into :query_vector;
END;
/  2    3    4    5    6
구   2:  :text_variable := '&text_input';
신   2:  :text_variable := 'different methods of backup and recovery';

PL/SQL 처리가 정상적으로 완료되었습니다.

SQL> PRINT query_vector

QUERY_VECTOR
--------------------------------------------------------------------------------
[-8.97755101E-003,8.22615102E-002,-8.34536925E-003,-1.81610305E-002,

 

위에서 설정한 query_vector로 doc_chunks 테이블에서 유사한 chunk를 검색합니다.

 

SQL> SELECT doc_id, chunk_id, chunk_data
FROM doc_chunks
ORDER BY vector_distance(chunk_embedding , :query_vector, COSINE)
FETCH FIRST 4 ROWS ONLY;  2    3    4

    DOC_ID   CHUNK_ID
---------- ----------
CHUNK_DATA
--------------------------------------------------------------------------------
         2       1912
rolled back, which

means they are undone.

An illustration of the need for transactions is a funds
transfer from a savings account to

a checking account. The transfer consists of the following

    DOC_ID   CHUNK_ID
---------- ----------
CHUNK_DATA
--------------------------------------------------------------------------------
separate operations:

1. Decrease the savings account.

2. Increase the checking account.

3. Record the transaction in the transaction journal.

Oracle Database guarantees that all three operations succeed or

    DOC_ID   CHUNK_ID
---------- ----------
CHUNK_DATA
--------------------------------------------------------------------------------
fail as a unit. For

example, if a hardware failure prevents a statement in the
transaction from executing,

         2       2612
all rolled back, which means they are undone. An illustration of the need
for transactions is a funds transfer from a savings account to a checking
account. The transfer consists of the following separate operations:

    DOC_ID   CHUNK_ID
---------- ----------
CHUNK_DATA
--------------------------------------------------------------------------------

Chapter 12

DBMS_VECTOR_CHAIN

12-103

1. Decrease the savings account.


    DOC_ID   CHUNK_ID
---------- ----------
CHUNK_DATA
--------------------------------------------------------------------------------
2. Increase the checking account.

3. Record the transaction in the transaction journal.

Oracle Database guarantees that all three operations succeed or fail as a
unit. For example, if a hardware failure prevents a statement in the

         2        166
DEFINE text_input = '&text'

    DOC_ID   CHUNK_ID
---------- ----------
CHUNK_DATA
--------------------------------------------------------------------------------

SELECT '&text_input';

VARIABLE text_variable VARCHAR2(1000)

VARIABLE query_vector CLOB

BEGIN


    DOC_ID   CHUNK_ID
---------- ----------
CHUNK_DATA
--------------------------------------------------------------------------------
:text_variable := '&text_input';

SELECT vector_embedding(doc_model using :text_variable as data)
into :query_vector;

END;

/


    DOC_ID   CHUNK_ID
---------- ----------
CHUNK_DATA
--------------------------------------------------------------------------------
PRINT query_vector

8.
Run a similarity search to find, within your books, the first four most relevant
 chunks that
talk about backup and recovery.

         2       1236
'193069E-002,-3.22583504E-002,8.54402035E-002,-2.20514946E-002]';

    DOC_ID   CHUNK_ID
---------- ----------
CHUNK_DATA
--------------------------------------------------------------------------------

INSERT INTO doc_queries VALUES (13, 'different methods of backup and
recovery', e);

COMMIT;

END;

/

    DOC_ID   CHUNK_ID
---------- ----------
CHUNK_DATA
--------------------------------------------------------------------------------

You can also create a table that holds
BINARY

vectors:
DROP TABLE my_bin_tab PURGE;

CREATE TABLE my_bin_tab(id NUMBER, data VECTOR(16, BINARY));

 

2)디비버 사용시

디비버에서는 한 쿼리로 query_vector를 바로 생성 후 조회합니다.

SELECT doc_id, chunk_id, chunk_data
FROM doc_chunks
ORDER BY vector_distance(chunk_embedding , vector_embedding(ALL_MINILM_L12_V2 USING 'different methods of backup and recovery' as data), COSINE)
FETCH FIRST 4 ROWS ONLY;


DOC_ID|CHUNK_ID|CHUNK_DATA                                                                                                                                                                                                                                                     |
------+--------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
     2|    1912|rolled back, which¶¶means they are undone.¶¶An illustration of the need for transactions is a funds ¶transfer from a savings account to¶¶a checking account. The transfer consists of the following ¶separate operations:¶¶1. Decrease the savings account.¶¶2.|
     2|    2612|all rolled back, which means they are undone. An illustration of the need ¶for transactions is a funds transfer from a savings account to a checking ¶account. The transfer consists of the following separate operations:¶¶Chapter 12¶¶DBMS_VECTOR_CHAIN¶¶12-1|
     2|     166|DEFINE text_input = '&text'¶¶SELECT '&text_input';¶¶VARIABLE text_variable VARCHAR2(1000)¶¶VARIABLE query_vector CLOB¶¶BEGIN¶¶:text_variable := '&text_input';¶¶SELECT vector_embedding(doc_model using :text_variable as data) ¶into :query_vector;¶¶END;¶¶/¶¶|
     2|    1236|'193069E-002,-3.22583504E-002,8.54402035E-002,-2.20514946E-002]';¶¶INSERT INTO doc_queries VALUES (13, 'different methods of backup and ¶recovery', e);¶¶COMMIT;¶¶END;¶¶/¶¶You can also create a table that holds ¶BINARY¶¶vectors: ¶DROP TABLE my_bin_tab PURG|

 

 

출처 : AI Vector Search User's Guide

반응형

댓글