In this week’s quick GenAI reads, we’d like to share a thought on enhancing QA performance using proprietary data.
Traditionally, the approach involves vectorizing data and conducting a similarity search.
However, an alternative method could be employed:
1. Process all documents through a language model, generating diverse question types.
For instance:
– “You’re going to Bangalore next week. It’s the ultimate destination.”
– Questions generated might be:
– “Where are you traveling to next week?”
– “What’s considered the best place globally?”
– “Is the upcoming trip to Bangalore?”
– “Is Bangalore truly the best place?”
2. Vectorize these questions, not the documents themselves.
3. When a new question arises, perform a similarity search on these question vectors.
4. The similar questions are returned by the similarity algorithm, regardless of the inquiry.
5. Submit these questions, along with the original question, to the language model.
6. Evaluate if the similar questions contain the query or something similar.
7. If affirmative, have the language model return those questions.
8. Attach the source file as metadata to each vectorized question.
9. Provide the entire document and question to the language model to formulate an answer.
The aim here is to achieve higher-quality retrieval by offering comprehensive context.
At its core, the principle behind this method is:
Within any document set, only a finite number of querying approaches exist.