import os, tempfile
from onprem import LLM
= tempfile.mkdtemp()
vectordb_path = LLM(
llm ="sentence-transformers/nli-mpnet-base-v2",
embedding_model_name={"normalize_embeddings": True},
embedding_encode_kwargs=vectordb_path,
vectordb_path )
Semantic Similarity
The underlying vector database in OnPrem.LLM can be used for detecting semantic similarity among pieces of text.
= [ # from txtai
data "US tops 5 million confirmed virus cases",
"Canada's last fully intact ice shelf has suddenly collapsed, forming a Manhattan-sized iceberg",
"Beijing mobilises invasion craft along coast as Taiwan tensions escalate",
"The National Park Service warns against sacrificing slower friends in a bear attack",
"Maine man wins $1M from $25 lottery ticket",
"Make huge profits without work, earn up to $100,000 a day",
]= tempfile.mkdtemp()
source_folder for i, d in enumerate(data):
= os.path.join(source_folder, f"doc{i}.txt")
filename with open(filename, "w") as f:
f.write(d)
=500, chunk_overlap=0) llm.ingest(source_folder, chunk_size
Creating new vectorstore at /tmp/tmpqbwhmx3v
Loading documents from /tmp/tmpocpf9fe4
Loading new documents: 100%|█████████████████████| 6/6 [00:00<00:00, 931.07it/s]
Loaded 6 new documents from /tmp/tmpocpf9fe4
Split into 6 chunks of text (max. 500 chars each)
Creating embeddings. May take some minutes...
Ingestion complete! You can now query your documents using the LLM.ask method
Here, we get a reference to the underlying vector store and query it directly to find the best semantic match.
= llm.load_ingester().get_db()
db for query in (
"feel good story",
"climate change",
"public health story",
"war",
"wildlife",
"asia",
"lucky",
"dishonest junk",
):= db.similarity_search(query)
docs print(f"{query} : {docs[0].page_content}")
feel good story : Maine man wins $1M from $25 lottery ticket
climate change : Canada's last fully intact ice shelf has suddenly collapsed, forming a Manhattan-sized iceberg
public health story : US tops 5 million confirmed virus cases
war : Beijing mobilises invasion craft along coast as Taiwan tensions escalate
wildlife : The National Park Service warns against sacrificing slower friends in a bear attack
asia : Beijing mobilises invasion craft along coast as Taiwan tensions escalate
lucky : Maine man wins $1M from $25 lottery ticket
dishonest junk : Make huge profits without work, earn up to $100,000 a day