microsoft/promptbench
A unified evaluation framework for large language models
Language:Python
Total stars: 857
Stars trend:
#python
#adversarialattacks, #benchmark, #chatgpt, #evaluation, #largelanguagemodels, #prompt, #promptengineering, #robustness
A unified evaluation framework for large language models
Language:Python
Total stars: 857
Stars trend:
25 Dec 2023
7pm ██▍ +19
8pm █▏ +9
9pm █▍ +11
10pm █▌ +12
11pm █▍ +11
26 Dec 2023
12am █▌ +12
1am ██▌ +20
2am █▉ +15
3am █▌ +12
4am ▉ +7
5am █▎ +10
6am ██ +16
#python
#adversarialattacks, #benchmark, #chatgpt, #evaluation, #largelanguagemodels, #prompt, #promptengineering, #robustness
langfuse/langfuse
🪢 Open source LLM engineering platform. Observability, metrics, evals, prompt management, testing, prompt playground, datasets, LLM evaluations -- 🍊YC W23 🤖 integrate via Typescript, Python / Decorators, OpenAI, Langchain, LlamaIndex, Litellm, Instructor, Mistral, Perplexity, Claude, Gemini, Vertex
Language:TypeScript
Total stars: 3172
Stars trend:
#typescript
#analytics, #evaluation, #gpt, #langchain, #largelanguagemodels, #llamaindex, #llm, #llmagent, #llmframework, #llmops, #monitoring, #observability, #opensource, #openai, #playground, #promptengineering, #promptmanagement, #selfhosted, #ycombinator
🪢 Open source LLM engineering platform. Observability, metrics, evals, prompt management, testing, prompt playground, datasets, LLM evaluations -- 🍊YC W23 🤖 integrate via Typescript, Python / Decorators, OpenAI, Langchain, LlamaIndex, Litellm, Instructor, Mistral, Perplexity, Claude, Gemini, Vertex
Language:TypeScript
Total stars: 3172
Stars trend:
27 Apr 2024
7am ▏ +1
8am +0
9am ▏ +1
10am ▎ +2
11am █▍ +11
12pm █▎ +10
1pm ▋ +5
2pm █▏ +9
3pm █▋ +13
4pm █▏ +9
5pm ▉ +7
6pm █ +8
#typescript
#analytics, #evaluation, #gpt, #langchain, #largelanguagemodels, #llamaindex, #llm, #llmagent, #llmframework, #llmops, #monitoring, #observability, #opensource, #openai, #playground, #promptengineering, #promptmanagement, #selfhosted, #ycombinator
Helicone/helicone
🧊 Open source LLM-Observability Platform for Developers. One-line integration for monitoring, metrics, evals, agent tracing, prompt management, playground, etc. Supports OpenAI SDK, Vercel AI SDK, Anthropic SDK, LiteLLM, LLamaIndex, LangChain, and more. 🍓 YC W23
Language:TypeScript
Total stars: 2410
Stars trend:
#typescript
#agentmonitoring, #analytics, #evaluation, #gpt, #langchain, #largelanguagemodels, #llamaindex, #llm, #llmcost, #llmevaluation, #llmobservability, #llmops, #monitoring, #opensource, #openai, #playground, #promptengineering, #promptmanagement, #ycombinator
🧊 Open source LLM-Observability Platform for Developers. One-line integration for monitoring, metrics, evals, agent tracing, prompt management, playground, etc. Supports OpenAI SDK, Vercel AI SDK, Anthropic SDK, LiteLLM, LLamaIndex, LangChain, and more. 🍓 YC W23
Language:TypeScript
Total stars: 2410
Stars trend:
20 Dec 2024
9am ▏ +1
10am +0
11am █▍ +11
12pm █▍ +11
1pm █▏ +9
2pm ▉ +7
3pm ▌ +4
4pm █▏ +9
5pm █▏ +9
6pm ▍ +3
7pm ▍ +3
8pm █▏ +9
#typescript
#agentmonitoring, #analytics, #evaluation, #gpt, #langchain, #largelanguagemodels, #llamaindex, #llm, #llmcost, #llmevaluation, #llmobservability, #llmops, #monitoring, #opensource, #openai, #playground, #promptengineering, #promptmanagement, #ycombinator
langwatch/langwatch
The ultimate LLM Ops platform - Monitoring, Analytics, Evaluations, Datasets and Prompt Optimization ✨
Language:TypeScript
Total stars: 822
Stars trend:
#typescript
#ai, #analytics, #datasets, #dspy, #evaluation, #gpt, #llm, #llmops, #lowcode, #observability, #openai, #promptengineering
The ultimate LLM Ops platform - Monitoring, Analytics, Evaluations, Datasets and Prompt Optimization ✨
Language:TypeScript
Total stars: 822
Stars trend:
16 Jan 2025
4pm ▊ +6
5pm █▍ +11
6pm █▌ +12
7pm █▎ +10
8pm ▋ +5
9pm ▉ +7
10pm █ +8
11pm ▎ +2
17 Jan 2025
12am ▌ +4
1am ▊ +6
2am ▋ +5
#typescript
#ai, #analytics, #datasets, #dspy, #evaluation, #gpt, #llm, #llmops, #lowcode, #observability, #openai, #promptengineering
promptfoo/promptfoo
Test your prompts, agents, and RAGs. Red teaming, pentesting, and vulnerability scanning for LLMs. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command line and CI/CD integration.
Language:TypeScript
Total stars: 6971
Stars trend:
#typescript
#ci, #cicd, #cicd, #evaluation, #evaluationframework, #llm, #llmeval, #llmevaluation, #llmevaluationframework, #llmops, #pentesting, #promptengineering, #prompttesting, #prompts, #rag, #redteaming, #testing, #vulnerabilityscanners
Test your prompts, agents, and RAGs. Red teaming, pentesting, and vulnerability scanning for LLMs. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command line and CI/CD integration.
Language:TypeScript
Total stars: 6971
Stars trend:
31 May 2025
9pm ▌ +4
10pm ▋ +5
11pm ▏ +1
1 Jun 2025
12am █ +8
1am ▋ +5
2am ▊ +6
3am █▏ +9
4am ▊ +6
5am ▉ +7
6am █▏ +9
7am ▍ +3
8am █▋ +13
#typescript
#ci, #cicd, #cicd, #evaluation, #evaluationframework, #llm, #llmeval, #llmevaluation, #llmevaluationframework, #llmops, #pentesting, #promptengineering, #prompttesting, #prompts, #rag, #redteaming, #testing, #vulnerabilityscanners
deepsense-ai/ragbits
Building blocks for rapid development of GenAI applications
Language:Python
Total stars: 275
Stars trend:
#python
#agents, #documentsearch, #evaluation, #guardrails, #llms, #optimization, #prompts, #rag, #vectorstores
Building blocks for rapid development of GenAI applications
Language:Python
Total stars: 275
Stars trend:
4 Jun 2025
12pm ▎ +2
1pm ▍ +3
2pm █▋ +13
3pm ██▏ +17
4pm █▏ +9
5pm █▊ +14
6pm █▊ +14
7pm █▋ +13
8pm █▏ +9
#python
#agents, #documentsearch, #evaluation, #guardrails, #llms, #optimization, #prompts, #rag, #vectorstores