반응형
이전글과 이어집니다.
https://bae9086.tistory.com/571
이전 글에서는 pdf 파일을 chunk로 테이블에 insert하는 것까지 완료하였습니다.
이 데이터를 통해 유사성 검색을 테스트 해보도록 하겠습니다.
검색할 텍스트 : different methods of backup and recovery
임베딩 모델 : ALL_MINILM_L12_V2
SQLPLUS에서 수행하는 방법과 디비버(client tool)에서 수행하는 방법으로 테스트해보았습니다.
두개다 방식은 동일하며,
1) 임베딩모델을 이용하여 쿼리 벡터를 먼저 생성한 후
2)쿼리 벡터를 이용하여 유사한 Chunk를 테이블에서 검색합니다.
0)
임베딩 모델 확인
ORA-40284: 모델이 존재하지 않습니다 발생시에는 DROP 후 재생성합니다.
--기존 모델이 있다면 삭제
EXEC DBMS_VECTOR.DROP_ONNX_MODEL(model_name => 'ALL_MINILM_L12_V2', force => true);
--생성
BEGIN
DBMS_VECTOR.LOAD_ONNX_MODEL(
directory => 'VEC_DUMP',
file_name => 'all_MiniLM_L12_v2.onnx',
model_name => 'ALL_MINILM_L12_V2');
END;
/
--확인
SELECT MODEL_NAME, MINING_FUNCTION, ALGORITHM,
ALGORITHM_TYPE, MODEL_SIZE
FROM user_mining_models
WHERE model_name = 'ALL_MINILM_L12_V2'
ORDER BY MODEL_NAME;
MODEL_NAME |MINING_FUNCTION|ALGORITHM|ALGORITHM_TYPE|MODEL_SIZE|
-----------------+---------------+---------+--------------+----------+
ALL_MINILM_L12_V2|EMBEDDING |ONNX |NATIVE | 133322334|
1)SQLPLUS에서 확인
--PDB 접속
$ sqlplus system@localhost:1521/FREEPDB1
SQL> ACCEPT text_input CHAR PROMPT 'Enter text: '
Enter text: different methods of backup and recovery
SQL> VARIABLE text_variable VARCHAR2(1000)
SQL> VARIABLE query_vector VECTOR
SQL> BEGIN
:text_variable := '&text_input';
SELECT vector_embedding(ALL_MINILM_L12_V2 using :text_variable as data)
into :query_vector;
END;
/ 2 3 4 5 6
구 2: :text_variable := '&text_input';
신 2: :text_variable := 'different methods of backup and recovery';
PL/SQL 처리가 정상적으로 완료되었습니다.
SQL> PRINT query_vector
QUERY_VECTOR
--------------------------------------------------------------------------------
[-8.97755101E-003,8.22615102E-002,-8.34536925E-003,-1.81610305E-002,
위에서 설정한 query_vector로 doc_chunks 테이블에서 유사한 chunk를 검색합니다.
SQL> SELECT doc_id, chunk_id, chunk_data
FROM doc_chunks
ORDER BY vector_distance(chunk_embedding , :query_vector, COSINE)
FETCH FIRST 4 ROWS ONLY; 2 3 4
DOC_ID CHUNK_ID
---------- ----------
CHUNK_DATA
--------------------------------------------------------------------------------
2 1912
rolled back, which
means they are undone.
An illustration of the need for transactions is a funds
transfer from a savings account to
a checking account. The transfer consists of the following
DOC_ID CHUNK_ID
---------- ----------
CHUNK_DATA
--------------------------------------------------------------------------------
separate operations:
1. Decrease the savings account.
2. Increase the checking account.
3. Record the transaction in the transaction journal.
Oracle Database guarantees that all three operations succeed or
DOC_ID CHUNK_ID
---------- ----------
CHUNK_DATA
--------------------------------------------------------------------------------
fail as a unit. For
example, if a hardware failure prevents a statement in the
transaction from executing,
2 2612
all rolled back, which means they are undone. An illustration of the need
for transactions is a funds transfer from a savings account to a checking
account. The transfer consists of the following separate operations:
DOC_ID CHUNK_ID
---------- ----------
CHUNK_DATA
--------------------------------------------------------------------------------
Chapter 12
DBMS_VECTOR_CHAIN
12-103
1. Decrease the savings account.
DOC_ID CHUNK_ID
---------- ----------
CHUNK_DATA
--------------------------------------------------------------------------------
2. Increase the checking account.
3. Record the transaction in the transaction journal.
Oracle Database guarantees that all three operations succeed or fail as a
unit. For example, if a hardware failure prevents a statement in the
2 166
DEFINE text_input = '&text'
DOC_ID CHUNK_ID
---------- ----------
CHUNK_DATA
--------------------------------------------------------------------------------
SELECT '&text_input';
VARIABLE text_variable VARCHAR2(1000)
VARIABLE query_vector CLOB
BEGIN
DOC_ID CHUNK_ID
---------- ----------
CHUNK_DATA
--------------------------------------------------------------------------------
:text_variable := '&text_input';
SELECT vector_embedding(doc_model using :text_variable as data)
into :query_vector;
END;
/
DOC_ID CHUNK_ID
---------- ----------
CHUNK_DATA
--------------------------------------------------------------------------------
PRINT query_vector
8.
Run a similarity search to find, within your books, the first four most relevant
chunks that
talk about backup and recovery.
2 1236
'193069E-002,-3.22583504E-002,8.54402035E-002,-2.20514946E-002]';
DOC_ID CHUNK_ID
---------- ----------
CHUNK_DATA
--------------------------------------------------------------------------------
INSERT INTO doc_queries VALUES (13, 'different methods of backup and
recovery', e);
COMMIT;
END;
/
DOC_ID CHUNK_ID
---------- ----------
CHUNK_DATA
--------------------------------------------------------------------------------
You can also create a table that holds
BINARY
vectors:
DROP TABLE my_bin_tab PURGE;
CREATE TABLE my_bin_tab(id NUMBER, data VECTOR(16, BINARY));
2)디비버 사용시
디비버에서는 한 쿼리로 query_vector를 바로 생성 후 조회합니다.
SELECT doc_id, chunk_id, chunk_data
FROM doc_chunks
ORDER BY vector_distance(chunk_embedding , vector_embedding(ALL_MINILM_L12_V2 USING 'different methods of backup and recovery' as data), COSINE)
FETCH FIRST 4 ROWS ONLY;
DOC_ID|CHUNK_ID|CHUNK_DATA |
------+--------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
2| 1912|rolled back, which¶¶means they are undone.¶¶An illustration of the need for transactions is a funds ¶transfer from a savings account to¶¶a checking account. The transfer consists of the following ¶separate operations:¶¶1. Decrease the savings account.¶¶2.|
2| 2612|all rolled back, which means they are undone. An illustration of the need ¶for transactions is a funds transfer from a savings account to a checking ¶account. The transfer consists of the following separate operations:¶¶Chapter 12¶¶DBMS_VECTOR_CHAIN¶¶12-1|
2| 166|DEFINE text_input = '&text'¶¶SELECT '&text_input';¶¶VARIABLE text_variable VARCHAR2(1000)¶¶VARIABLE query_vector CLOB¶¶BEGIN¶¶:text_variable := '&text_input';¶¶SELECT vector_embedding(doc_model using :text_variable as data) ¶into :query_vector;¶¶END;¶¶/¶¶|
2| 1236|'193069E-002,-3.22583504E-002,8.54402035E-002,-2.20514946E-002]';¶¶INSERT INTO doc_queries VALUES (13, 'different methods of backup and ¶recovery', e);¶¶COMMIT;¶¶END;¶¶/¶¶You can also create a table that holds ¶BINARY¶¶vectors: ¶DROP TABLE my_bin_tab PURG|
출처 : AI Vector Search User's Guide
반응형
'Oracle > 운영' 카테고리의 다른 글
[23ai] ai벡터검색 - vector_distance (0) | 2024.11.13 |
---|---|
[23ai] new feature 벡터 타입과 ai벡터검색 - pdf파일을 백터검색하기 (2) (0) | 2024.11.12 |
[23ai] new feature 벡터 타입과 ai벡터검색 (1) (5) | 2024.11.11 |
DOP Downgrade? (0) | 2024.11.10 |
V$ASM_DISKGROUP의 수치별 설명 (0) | 2024.11.07 |
댓글