Sampled

Sampled batches operate similarly to random batches but with one key difference: they use "sketches" to generate query vectors. FirstBatch creates these sketches when you add vector databases to your instance. A sketch is an ML model trained to understand your vector index's data distribution and produce samples that mimic vectors from that original distribution.

While random vectors might produce low cosine similarity scores around 0.001, sampled batches can generate scores up to 0.8. This allows SDK users to obtain more topic-specific vectors, providing a more nuanced initial batch compared to random ones.

Accepted parameters for Sampled batches are:

n_topics: int = 0
remove_duplicates: bool = True
apply_threshold: Tuple[bool, float] = (False, 0.0)
apply_mmr: bool = False

Last updated