nyx_extras.utils¶
Module that contains utility functions, as well as tooling for manual parsing of data contained in Nyx.
Classes¶
Utility methods for query (system) prompt modification. |
|
A class for processing and querying datasets from Data instances. |
Module Contents¶
- class nyx_extras.utils.Utils¶
Utility methods for query (system) prompt modification.
- static with_sources(prompt, **kwargs)¶
Expand prompt with clause to request the data sources considered to be included in the response.
- static build_query(prompt, **kwargs)¶
Base prompt builder.
- class nyx_extras.utils.Parser¶
A class for processing and querying datasets from Data instances.
This class provides methods to convert data into SQL databases or vector representations, and to perform queries on the processed data.
- vectors¶
The TF-IDF vector representations of the processed content.
- vectorizer¶
The TfidfVectorizer instance used for creating vectors.
- chunks¶
The text chunks created from the processed content.
- static data_as_db(data, additional_information=None, sqlite_file=None, if_exists='replace')¶
Process the content of multiple Data instances into an in-memory SQLite database.
This method downloads the content of each Data (if it’s a CSV) and converts it to an in-memory SQLite database. The resulting database engine is then returned for use with language models.
- Parameters:
data (list[nyx_client.data.Data]) – A list of Data instances to process.
additional_information (VectorResult | None) – List of additional information to be stored in the DB as a fallback
sqlite_file (str | None) – Provide a file for the database to reside in
if_exists (Literal['fail', 'replace', 'append']) – What to do if a table already exists Defaults to “fail” can be “fail”, “append”, “replace”
- Returns:
An SQLAlchemy engine.Engine instance for the in-memory SQLite database.
- Return type:
sqlalchemy.engine.Engine
Note
If the list of data is empty, an empty database engine is returned.
- static normalise_values(values)¶
Normalise names in a list of values.
- Parameters:
values (collections.abc.Sequence[str]) – A sequence of values to normalise.
- Returns:
A list of normalised values.
- Return type:
- data_as_vectors(data, chunk_size=1000)¶
Process the content of multiple Data instances into vector representations.
This method downloads the content of each Data, combines it, chunks it, and creates a TF-IDF vectorizer for the chunks.
- Parameters:
data (collections.abc.Sequence[nyx_client.data.Data]) – A sequence of Data instances to process.
chunk_size (int) – The size of each chunk when splitting the content. Defaults to 1000.
- Returns:
The current Parser instance with updated vectors, vectorizer, and chunks.
Note
If no content is found in any of the data, the method returns without processing.
- query(text, k=3)¶
Query the processed data with a given text.
This method transforms the input text into a vector using the fitted vectorizer, and then finds the most similar chunks to this query vector.
- Parameters:
- Returns:
An object containing the top k matching chunks, their similarities, and associated metadata. If the vectorizer is not initialized, it returns a VectorResult indicating failure.
- Return type:
VectorResult
Note
This method assumes that self.vectorizer has been properly initialized. If self.vectorizer is None, it returns a VectorResult indicating failure.
- find_matching_chunk(query_vector, k=3)¶
Find the most similar chunks to the query vector.
This method computes the cosine similarity between the query vector and all document vectors, then returns the top k most similar chunks along with their similarities and metadata.
- Parameters:
query_vector (Any) – The vector representation of the query.
k (int) – The number of top matching chunks to return. Defaults to 3.
- Returns:
An object containing the top k matching chunks, their similarities, and associated metadata. If no vectors are available, it returns a VectorResult with empty lists and a failure message.
- Return type:
VectorResult
Note
This method assumes that self.vectors, self.chunks, and self.metadata have been properly initialized. If self.vectors is None, it returns a VectorResult indicating failure.