+search
150 subscribers
41 photos
6 files
153 links
Search, Apache Lucene, Apache Solr, Elasticsearch, OpenSearch, Vespa, Qdrant, etc

discuss https://t.me/+-2h4V8vi-eYwZTcy
Download Telegram
I were always wondered how https://qdrant.tech/documentation/concepts/indexing/#sparse-vector-index identifies terms. Note: Lucene orders terms per segments and assigns term ordinals sequentually that makes them uncomparable across segments and indices.
I’ve thouth that the sparse vectors solutions pulls some solid term dictionary from huggingface. However, it just https://github.com/qdrant/fastembed/blob/main/fastembed/sparse/bm25.py#L303
🥰2
совестливо и гадливо на душе
1😁1
Давно заметил что #архитектура создаваемых решений сама собой структурируетcя в конвейер-пайплайн,... https://telegra.ph/How-often-do-you-think-about-Roman-Empire-12-13
👍1
#yandexcloud #functions: топовый вопрос: скачать код cloud functions по-бырому накиданный в модном редакторе. Шэйм он ю. ok. вот вам rescue toolkit. Только ключи не выкладывйте!
Made one shot benchmark of parsing 64 pages PDF with text and images on small CPU instance:
- Apache Tika (since 2007!!)
- Docling opensourced by IBM recently
9 sec vs 6 minutes. Quality is on par. The technological singularity which we deserve.
same here
1
Так, без удовольствия позапускал контейнеры за API-Gateway @ #Yandex #Cloud https://vk.com/@mkhl_spb-zapusk-nebolshih-demoprilozhenii-v-yandex-cloud
Картинки приложил отдельно, для тех кто не логинится в вк.
👍1