Continuous Learning_Startup & Investment
2.43K subscribers
513 photos
5 videos
16 files
2.74K links
We journey together through the captivating realms of entrepreneurship, investment, life, and technology. This is my chronicle of exploration, where I capture and share the lessons that shape our world. Join us and let's never stop learning!
Download Telegram
Chip์ด ์ผํ•˜๊ธฐ๋„ ํ–ˆ๋˜ Snorkel์ด ๊ถ๊ธˆํ•ด์„œ ์ฐพ์•„๋ณด๋‹ˆ ์ด๋Ÿฐ ํšŒ์‚ฌ๊ตฐ์š” ใ…Žใ…Ž

Solving the Last Mile Problem of Foundation Models with Data-Centric AI
Everyone will soon be using foundation models (FMs) like GPT-4.

๋™์˜์ƒ์—์„œ Snorkel AI์˜ CEO์ธ ์•Œ๋ ‰์Šค ๋ž˜ํŠธ๋„ˆ๋Š” ๊ธฐ์ดˆ ๋ชจ๋ธ์˜ โ€˜๋ผ์ŠคํŠธ ๋งˆ์ผโ€™ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๋Š” ๋ฐ ์žˆ์–ด ๋ฐ์ดํ„ฐ ์ค‘์‹ฌ AI์˜ ์—ญํ• ๊ณผ ๋งž์ถคํ˜• ๋ฐ ๋„๋ฉ”์ธ๋ณ„ ๋ชจ๋ธ ๊ฐœ๋ฐœ์˜ ์ค‘์š”์„ฑ์— ๋Œ€ํ•ด ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋Š” ๋…์ ์ ์ธ ๋„๋ฉ”์ธ๋ณ„ ๋ฐ์ดํ„ฐ๋กœ ํ•™์Šต๋œ ๊ธฐ์ดˆ ๋ชจ๋ธ์„ ๋‚˜ํƒ€๋‚ด๋Š” โ€˜GPT-Youโ€™๋ผ๋Š” ๊ฐœ๋…์„ AI ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์˜ ๋ฏธ๋ž˜๋กœ ์†Œ๊ฐœํ•ฉ๋‹ˆ๋‹ค.

๊ทธ๋Š” AI ๊ฐœ๋ฐœ์˜ ๋ณธ์งˆ์ ์ด๊ณ  ๊ฐ€์žฅ ์ค‘์š”ํ•œ ์ธก๋ฉด์€ ์˜ฌ๋ฐ”๋ฅธ ์•„ํ‚คํ…์ฒ˜๋ฅผ ์„ ํƒํ•˜๊ฑฐ๋‚˜ ๋ชจ๋ธ ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ๋ฏธ์„ธ ์กฐ์ •ํ•˜๋Š” ๊ฒƒ์ด ์•„๋‹ˆ๋ผ ๋ฐ์ดํ„ฐ๋ฅผ ์กฐ์ž‘, ํ๋ ˆ์ดํŒ…, ๋ ˆ์ด๋ธ” ์ง€์ •, ์Šฌ๋ผ์ด์‹ฑ, ์ƒ˜ํ”Œ๋ง ๋ฐ ๊ฐœ๋ฐœํ•˜๋Š” ๊ฒƒ์ด๋ผ๊ณ  ๊ฐ•์กฐํ•ฉ๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ๋ฐ์ดํ„ฐ ์ค‘์‹ฌ ์ ‘๊ทผ ๋ฐฉ์‹์„ ํ†ตํ•ด ์ „๋ฌธํ™”๋œ ๊ณ ์„ฑ๋Šฅ ๊ธฐ๋ฐ˜ ๋ชจ๋ธ์„ ๋งŒ๋“ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๋˜ํ•œ ๋ผํŠธ๋„ˆ๋Š” โ€˜๋ผ์ŠคํŠธ ๋งˆ์ผโ€™ ๋ฌธ์ œ์— ๋Œ€ํ•œ ํ•ด๊ฒฐ์ฑ…์œผ๋กœ์„œ ๋ฐ์ดํ„ฐ ์ค‘์‹ฌ AI์˜ ๊ฐœ๋…์— ๋Œ€ํ•ด ์ž์„ธํžˆ ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค. ๋ฐ์ดํ„ฐ ์ค‘์‹ฌ AI๋Š” ๋ผ๋ฒจ๋ง, ์ƒ˜ํ”Œ๋ง, ํ๋ ˆ์ดํŒ…, ์ฆ๊ฐ• ๋“ฑ ๋ฐ์ดํ„ฐ๋ฅผ ๋ฐ˜๋ณตํ•˜์—ฌ ํŠน์ • ์ž‘์—…์„ ์œ„ํ•œ ์ „๋ฌธํ™”๋œ ๋ชจ๋ธ์„ ๋งŒ๋“œ๋Š” ๋ฐ ์ค‘์ ์„ ๋‘ก๋‹ˆ๋‹ค. ๊ทธ๋Š” ์ €๋ ดํ•œ ์ฟผ๋ฆฌ ์„ธํŠธ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ChatGPT๋ฅผ ๋ณต์ œํ•˜๋Š” ๋ฐ ์„ฑ๊ณตํ•œ Alpaca์™€ ๊ฐ™์€ ํ”„๋กœ์ ํŠธ์˜ ์„ฑ๊ณต๊ณผ ๋‚ด๊ตฌ์„ฑ ์žˆ๋Š” ๋ชจ๋ธ์„ ๋งŒ๋“œ๋Š” ๋ฐ ์žˆ์–ด ๊ฐœ์ธ ๋ฐ์ดํ„ฐ ๋ฐฐํฌ์˜ ์ค‘์š”์„ฑ์— ๋Œ€ํ•ด ๊ฐ•์กฐํ•ฉ๋‹ˆ๋‹ค.

๋ฐ์ดํ„ฐ ์ค‘์‹ฌ AI์˜ ์ ์šฉ์— ๋Œ€ํ•ด ๋ž˜ํŠธ๋„ˆ๋Š” ๋ฐ์ดํ„ฐ์˜ ๊ณ ์œ ์„ฑ๊ณผ ๋ชจ๋ธ์˜ ์ •ํ™•๋„๋ผ๋Š” ๋‘ ๊ฐ€์ง€ ์ฐจ์›์œผ๋กœ ๋‚˜๋ˆ„์–ด ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค. AI๋ฅผ ๊ฐ€์žฅ ๋ณด๋žŒ ์žˆ๊ฒŒ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š” ๊ฒฝ์šฐ๋Š” ๋ฐ์ดํ„ฐ๊ฐ€ ๊ธฐ์ดˆ ๋ชจ๋ธ์— ํ•™์Šต๋œ ๊ฒƒ๊ณผ ๋งค์šฐ ์œ ์‚ฌํ•˜๊ณ  ์˜ค๋ฅ˜์— ๋Œ€ํ•œ ํ—ˆ์šฉ ์˜ค์ฐจ๊ฐ€ ๋†’์€ ๊ฒฝ์šฐ์ž…๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ๋น„ํ‘œ์ค€ ๋ฐ์ดํ„ฐ์™€ ๋†’์€ ์ •ํ™•๋„๊ฐ€ ์š”๊ตฌ๋˜๋Š” ์ƒํ™ฉ์—์„œ๋Š” ๋ ˆ์ด๋ธ”์ด ์ง€์ •๋œ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ์ƒ๋‹นํ•œ ์–‘์˜ ์ž‘์—…๊ณผ ๋ฏธ์„ธ ์กฐ์ •์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.

๋ž˜ํŠธ๋„ˆ๋Š” ๋ฐ์ดํ„ฐ ์ค‘์‹ฌ AI๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ธฐ์ดˆ ๋ชจ๋ธ์„ ๊ฐœ๋ฐœํ•˜๊ธฐ ์œ„ํ•œ ํ”Œ๋žซํผ์ธ ์Šค๋…ธํด ํ”Œ๋กœ์šฐ๋ฅผ ์†Œ๊ฐœํ•ฉ๋‹ˆ๋‹ค. ์›Œํฌํ”Œ๋กœ์šฐ๋Š” ๊ธฐ๋ณธ ๊ธฐ์ดˆ ๋ชจ๋ธ์—์„œ ์‹œ์ž‘ํ•˜์—ฌ ํŠน์ • ์ž‘์—…์„ ์ •์˜ํ•˜๊ณ  ๊ธฐ์ดˆ ๋ชจ๋ธ์„ ๋ฐ์ดํ„ฐ์— ์ ์šฉํ•˜์—ฌ ์•ˆ๋‚ด์‹ ์˜ค๋ฅ˜ ๋ถ„์„์œผ๋กœ ์ด์–ด์ง‘๋‹ˆ๋‹ค. ์ด ํ”„๋กœ์„ธ์Šค๋Š” ๊ธฐ๋ณธ ๋ชจ๋ธ์˜ ์˜ค๋ฅ˜๋ฅผ ์‹๋ณ„ํ•˜๊ณ , ํ”„๋กœ๊ทธ๋ž˜๋ฐ ๋ฐฉ์‹์˜ ๋ผ๋ฒจ๋ง์„ ์‚ฌ์šฉํ•˜์—ฌ ์˜ค๋ฅ˜๋ฅผ ์ˆ˜์ •ํ•˜๋ฉฐ, ๋ฏธ์„ธ ์กฐ์ •์„ ์œ„ํ•ด ์ˆ˜์ • ๋ฐ ์ฆ๊ฐ•๋œ ๋ผ๋ฒจ๋ง ๋ฐ์ดํ„ฐ๋กœ ๊ธฐ๋ณธ ๋ชจ๋ธ์„ ์—…๋ฐ์ดํŠธํ•˜๊ฑฐ๋‚˜ ๋” ์ž‘์€ ๊ทœ๋ชจ์˜ ์ž‘์—…๋ณ„ ๋ชจ๋ธ๋กœ ์ถ”์ถœํ•ฉ๋‹ˆ๋‹ค.

๊ฒฐ๋ก ์ ์œผ๋กœ ๋ผํŠธ๋„ˆ๋Š” ๋ฐ์ดํ„ฐ ์ค‘์‹ฌ AI์˜ ์ค‘์š”์„ฑ๊ณผ ์ด๊ฒƒ์ด ๊ธฐ์ดˆ ๋ชจ๋ธ์˜ ๊ฐœ๋ฐœ ๋ฐ ๋ฏธ์„ธ ์กฐ์ •์— ๊ฐ€์ ธ๋‹ค์ฃผ๋Š” ๊ฐ€์น˜๋ฅผ ๊ฐ•์กฐํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋Š” ์ด๋Ÿฌํ•œ ์›์น™์— ๋Œ€ํ•œ ๋” ๋งŽ์€ ํƒ๊ตฌ์™€ ์ ์šฉ์„ ์žฅ๋ คํ•˜๋ฉฐ, ๋ฐ์ดํ„ฐ ์ค‘์‹ฌ AI์™€ ๊ธฐ์ดˆ ๋ชจ๋ธ์˜ ๋ฏธ๋ž˜์— ์ดˆ์ ์„ ๋งž์ถ˜ ํ–ฅํ›„ ์ปจํผ๋Ÿฐ์Šค์— ์ฒญ์ค‘์„ ์ดˆ๋Œ€ํ•ฉ๋‹ˆ๋‹ค.
Our AI and ML Predictions for 2023: AI Arize
With prediction season coming to a close, here are a few hot takes on some prevailing trends and a few educated guesses on the year ahead.

1. Prompt Engineering Will Grow Into a Field: Prompt engineering doesnโ€™t feel much like data science, but prompting is a growing area of significant importance with large language models (LLMs). In a way, finding the right prompt to get the right response turns a data science fine tuning problem into a fast and iterative prompt testing problem. There are already entire products such as Github Copilot that are built on top of prompt engineering with OpenAI, combining a well-executed product integration with GitHub and a data silo advantage. LLMs such as OpenAI will likely release pre-set prompts to help you ask the right questions. Word of caution: products built on prompts alone are not defensible.

2. A Multi-Modal LLM Will Go Mainstream: An LLM that supports both image or video and language together will be released this year to a lot of fanfare, possibly even eclipsing the fanfare around ChatGPT. The power of having both modalities in a single model is underappreciated right now and will be as groundbreaking in new use cases as ChatGPT is to conversation assistants.

3. Multiple Successful Vertical AI Assistants Will Emerge: We are seeing vertically-focused LLMs with the right prompt construction, software interface, and the right data creating focused value. The best execution to date has been GitHub Copilot. There will likely be multiple vertical AI assistants in fields like the law, medicine and biotech. These will be accomplished by connecting an LLM to unique vertical-specific datasets, fine tuning and aligning in specific verticals, prompt engineering for that vertical, and offering a user interface with workflows for that vertical. I believe these can be defensible if done right.

4. ChatGPT Will Not Threaten Google; Itโ€™s an Entirely New Large Market: It is very normal to see something new and try to think of it as a product replacement for what you already know. This is often the wrong way to look at it. I view ChatGPT as a whole new type of technology that enables a wide swath of products that have nothing to do with search. I think the network effects of search will be near impossible to dislodge in the near team, and the value replacement is just not there yet. That said, large new markets are enabled by the progress in LLMs. Itโ€™s also likely that a sizable portion of traffic distribution will occur through product integrations like CoPilot versus traffic arriving to and typing in a single home page.

5. Embedding Use for Interpretability and Content Control for Models/LLMs will Accelerate: The use of embeddings analysis for AI interpretability will grow as a field. Toolsets will launch that use these embeddings to monitor and control AI. We have already seen incredible pickup of our embedding drift solutions across a wide swath of industries. Weโ€™ve also seen embeddings used for content protection in Dall-E and other generative models. Embeddings represent the latent structure models have learned and they are the backbone of every modern deep learning model.
MLSys 2023 ํ•™ํšŒ์— ์ฐธ์—ฌํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ๋ˆˆ์— ๋ณด์ด๋Š” ๋…ผ๋ฌธ ๋ช‡ํŽธ ์†Œ๊ฐœํ•ด๋“œ๋ฆด๊นŒ ํ•ฉ๋‹ˆ๋‹ค.
https://arxiv.org/abs/2305.02538 / Cuttlefish: Low-Rank Model Training without All the Tuning
๊ฐœ์ธ์ ์œผ๋กœ๋Š” ๋ณ„๋กœ ์ข‹์•„ํ•˜๋Š” ์—ฐ๊ตฌ๋Š” ์•„๋‹™๋‹ˆ๋‹ค๋งŒ, Low-Rank ๋ชจ๋ธ Training์— ๊ด€์‹ฌ์ด ๋งŽ์œผ์‹  ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค. Low-rank traning๊ณผ ์ž๋™ํ™”๋ฅผ ์ข€ ์„ž์€๊ฒƒ ๊ฐ™๊ณ , LLM์€ ๋ชจ๋ฅด๊ฒ ์ง€๋งŒ, BERT์ •๋„๊นŒ์ง€๋Š” ์ข‹์€๊ฒฐ๊ณผ๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.
https://arxiv.org/abs/2211.05102 / Efficiently Scaling Transformer Inference
๊ตฌ๊ธ€์˜ ์œ ๋ช…ํ•œ ๋…ผ๋ฌธ์ž…๋‹ˆ๋‹ค. Outstanding Paper ์ƒ์„ ๋ฐ›์•˜๊ตฌ์š”. PaLM์˜ ์„œ๋น™ ์‹œ์Šคํ…œ์„ ๋‹ค๋ฃจ๊ณ  ์žˆ๋Š” ๋…ผ๋ฌธ์ž…๋‹ˆ๋‹ค. TPU ์ตœ์ ํ™”๋œ ๋‚ด์šฉ์ด๋ผ์„œ, GPUํ–ฅ์—์„œ๋Š” ์•ฝ๊ฐ„ ๋‹ค๋ฅผ์ˆ˜ ์žˆ๋‹ค๋Š” ํ•จ์ •์ด์ง€๋งŒ, ๋ฐฐ์šธ ์ˆ˜ ์žˆ๋Š” ๋‚ด์šฉ์ด ๋งŽ์Šต๋‹ˆ๋‹ค. ์ตœ๊ทผ ๊ตฌ๊ธ€์—์„œ Grouped-Query Attention์ด ๋‚˜์™”๋Š”๋ฐ, ์–˜๋Š” ๊ธฐ๋ณธ์ ์œผ๋กœ Multi-query attention (MQA)๋ฅผ ์‚ฌ์šฉํ•˜๊ณ  ์žˆ๊ณ , Feed forward ๋ฅผ ์œ„ํ•œ ํŒŒํ‹ฐ์…”๋‹ ์ „๋žต์„ ์„ค๋ช…ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.
LLM Serving์— ์žˆ์–ด์„œ ์ œ์ผ ์ค‘์š”ํ•˜๋‹ค๊ณ  ์ƒ๊ฐํ•˜๋Š” ๊ฒƒ์ค‘ ํ•˜๋‚˜๊ฐ€ Latencyํ–ฅ ์ตœ์ ํ™”, throughput ์ตœ์ ํ™” ์˜ ์ฐจ์ด๋ฅผ ์ดํ•ดํ•˜๋Š” ๊ฒƒ์ด๋ผ๊ณ  ์ƒ๊ฐํ•ฉ๋‹ˆ๋‹ค. ๊ทธ ๋ถ€๋ถ„์— ๋Œ€ํ•ด์„œ ๋งŽ์ด ๋ฐฐ์šฐ์‹ค์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
https://proceedings.mlsys.org/paper_files/paper/2023/file/4552cedd396a308320209f75f56a5ad5-Paper-mlsys2023.pdf / Efficient GPU Kernels for N:M-Sparse Weights in Deep Learning
์š”์ฆ˜ N:M sparsity๊ฐ€ ์ •๋ง ๊ฐ€์†์ด ์ž˜ ๋˜๋Š”์ง€ ๊ถ๊ธˆํ•œ๋ฐ.. ์ €๋„ ๊ณต๋ถ€ํ•ด๋ณผ๊นŒ ์‹ถ์–ด์„œ ๋ถ™์—ฌ๋„ฃ์—ˆ์Šต๋‹ˆ๋‹ค.
https://proceedings.mlsys.org/paper_files/paper/2023/file/de4086ad4276d895be8ef25ec03c964b-Paper-mlsys2023.pdf / Unified Convolution Framework: A compiler-based approach to support sparse convolutions
ํ•œ๊ตญ์ธ MIT ํ•™์ƒ์˜ ๋ฐœํ‘œ์ธ๋ฐ, Compiler ๊ถ๊ธˆํ•˜์‹  ๋ถ„๋“ค ๋ณด์‹œ์ฃ  ใ…Žใ…Ž

๋„ค์ด๋ฒ„ ๊ถŒ์„ธ์ค‘๋‹˜
What is great friendship? How could we cultivate this?

Reid Hoffman

1. ์นœ๊ตฌ๋“ค์€ ๋‚ด๊ฐ€ ๋ณผ ์ˆ˜ ์—†๋Š” ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ๋„๋ก ๋„์™€์ค„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
2. ๋‚ด๊ฐ€ ์นœ๊ตฌ๋“ค์„ ๋•๊ณ  ์นœ๊ตฌ๋“ค์ด ๋‚˜๋ฅผ ๋„์™€์ฃผ๋ฉด ์šฐ๋ฆฌ๋Š” ๋” ์ž˜ํ•˜๊ณ  ๋” ๋ฉ€๋ฆฌ ๊ฐˆ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
3. ๋‚˜์—๊ฒŒ๋Š” ์ˆ˜๋ฐฑ ๋ช…์˜ ์นœ๊ตฌ, ๋ฉ˜ํ† ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค.
4. ์นœ๊ตฌ๋“ค์€ ๋‹น์‹ ์ด ๊ทธ๋“ค์„ ๋•๋„๋ก ํ—ˆ๋ฝํ•ฉ๋‹ˆ๋‹ค. ์นœ๊ตฌ๋“ค์€ ๋Œ€ํ™”๋ฅผ ํ†ตํ•ด ์ €๋ฅผ ์ „์ ์œผ๋กœ ์‹ ๋ขฐํ•˜๊ณ  ๋ฏฟ์–ด์ฃผ์–ด ์ €๋ฅผ ๊ฐ€์น˜ ์žˆ๋Š” ์‚ฌ๋žŒ์œผ๋กœ ๋งŒ๋“ค์–ด์ฃผ์—ˆ์Šต๋‹ˆ๋‹ค.
5. ์šฐ์ •์„ ์ตœ์šฐ์„  ์ˆœ์œ„๋กœ ์‚ผ๋Š”๋‹ค.
6. ์˜์‹์ ์œผ๋กœ ์ข‹์€ ์šฐ์ •์„ ๋งŒ๋“ ๋‹ค. ์˜์‹์„ ๋งŒ๋“ ๋‹ค. ์šฐ์ •์ด ๋ฌด์—‡์ด๋ฉฐ ์–ด๋–ป๊ฒŒ ๊ฐœ์„ ํ•  ์ˆ˜ ์žˆ๋Š”์ง€ ์ด์•ผ๊ธฐํ•˜์„ธ์š”.
7. ๋‹น์‹ ์ด ์›ํ•˜๋Š” ๊ฒƒ๊ณผ ๋‹น์‹ ์˜ ๊ฟˆ์„ ์นœ๊ตฌ๋“ค์—๊ฒŒ ๊ณต์œ ํ•˜์„ธ์š”.
8. ์นœ๊ตฌ๋“ค์€ ๋‹น์‹ ์ด ์›ํ•˜๋Š” ๊ฒƒ์ด ์•„๋‹ˆ๋ผ ๋‚ด๊ฐ€ ํ•ด์•ผ ํ•  ์ผ์„ ๋งํ•ด์ค๋‹ˆ๋‹ค.
9. ์ธ์ƒ์€ ๊ฐ€์žฅ ์ถฉ๋งŒํ•œ ํŒ€ ์Šคํฌ์ธ ์ž…๋‹ˆ๋‹ค.

---

Charlie Munger and Warren Buffet

๋…์ด ๋˜๋Š” ์‚ฌ๋žŒ์„ ํ”ผํ•˜์„ธ์š”: ๋ฒ„ํ•๊ณผ ๋ฉ๊ฑฐ๋Š” ๋ชจ๋‘ ์‚ถ์— ๋ถ€์ •์ ์ธ ์˜ํ–ฅ์„ ๋ฏธ์น  ์ˆ˜ ์žˆ๋Š” ๋…์„ฑ์ด ์žˆ๋Š” ์‚ฌ๋žŒ๋“ค๊ณผ ๋ฉ€๋ฆฌ ๋–จ์–ด์ ธ ์ง€๋‚ด๋Š” ๊ฒƒ์ด ์ค‘์š”ํ•˜๋‹ค๊ณ  ๊ฐ•์กฐํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ๋ฒ„ํ•์€ ์‚ฌ๋žŒ์„ ๋น„๋ฐฉํ•˜๊ณ  ๋‹ค๋ฆฌ๋ฅผ ๋ถˆํƒœ์šฐ๋Š” ๊ฒƒ์€ ์นœ๊ตฌ๊ฐ€ ์ „ํ˜€ ์—†์„ ์ˆ˜๋„ ์žˆ๋‹ค๊ณ  ๊ฒฝ๊ณ ํ•ฉ๋‹ˆ๋‹ค.

๊ณต์œ ๋œ ๊ฐ€์น˜์™€ ์‹ ๋ขฐ: ๋ฒ„ํ•๊ณผ ๋ฉ๊ฑฐ์˜ ์˜ค๋žœ ์šฐ์ •์€ ๊ณต์œ ๋œ ๊ฐ€์น˜, ์‹ ๋ขฐ, ์ƒํ˜ธ ์กด์ค‘์„ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•ฉ๋‹ˆ๋‹ค.

์„œ๋กœ์—๊ฒŒ์„œ ๋ฐฐ์šฐ๊ธฐ: ๋ฒ„ํ•์€ ๋ฉ๊ฑฐ์˜ ์ง€์„ฑ, ์œ ๋จธ ๊ฐ๊ฐ, ๋ช…์พŒํ•œ ์‚ฌ๊ณ ๋ ฅ ๋“ฑ ๊ทธ๊ฐ€ ์ค‘์š”ํ•˜๊ฒŒ ์—ฌ๊ธฐ๋Š” ์ž์งˆ์— ๋Œ€ํ•ด ์ž์ฃผ ์–ธ๊ธ‰ํ•ด ์™”์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ ๋ฉ๊ฑฐ๋Š” ๋ฒ„ํ•์˜ ํˆฌ์ž ๋Šฅ๋ ฅ๊ณผ ์ง€ํ˜œ์— ๋Œ€ํ•ด ์กด๊ฒฝ์‹ฌ์„ ํ‘œํ•˜๊ธฐ๋„ ํ–ˆ์Šต๋‹ˆ๋‹ค.

---

Naval Ravikant

์žฅ๊ธฐ์ ์ธ ์šฐ์ •์„ ์†Œ์ค‘ํžˆ ์—ฌ๊น๋‹ˆ๋‹ค: Naval์€ ์ƒํ˜ธ ์กด์ค‘๊ณผ ๊ณต์œ ๋œ ๊ฐ€์น˜์— ๊ธฐ๋ฐ˜ํ•œ ์žฅ๊ธฐ์ ์ธ ์šฐ์ •๊ณผ ๋™๋ฃŒ ๊ด€๊ณ„์˜ ์ค‘์š”์„ฑ์„ ๊ฐ•์กฐํ•ฉ๋‹ˆ๋‹ค.

์ •์ง: Naval์— ๋”ฐ๋ฅด๋ฉด ์ •์ง์€ ์šฐ์ •๊ณผ ๊ด€๊ณ„์˜ ๊ธฐ๋ณธ ๊ฐ€์น˜์ž…๋‹ˆ๋‹ค. ์นœ๊ตฌ์—๊ฒŒ ์ •์งํ•˜๊ณ  ํˆฌ๋ช…ํ•˜๊ฒŒ ํ–‰๋™ํ•˜๋ฉด ์‹ ๋ขฐ๋ฅผ ์Œ“๊ณ  ์œ ๋Œ€๊ฐ์„ ๊ฐ•ํ™”ํ•˜๋Š” ๋ฐ ๋„์›€์ด ๋ฉ๋‹ˆ๋‹ค.

๊ด€์‹ฌ์‚ฌ ๊ณต์œ : Naval์€ ์นœ๊ตฌ์™€ ๊ด€์‹ฌ์‚ฌ์™€ ์—ด์ •์„ ๊ณต์œ ํ•˜๋ฉด ์šฐ์ •์ด ๋”์šฑ ํ’์„ฑํ•˜๊ณ  ์ฆ๊ฑฐ์›Œ์งˆ ์ˆ˜ ์žˆ๋‹ค๊ณ  ๋ฏฟ์Šต๋‹ˆ๋‹ค.

๋…๋ฆฝ์„ฑ: Naval์€ ํŠธ์œ—์—์„œ ์นœ๊ตฌ๋‚˜ ๋˜๋ž˜ ์ง‘๋‹จ์˜ ์‹ ๋…์— ๋”ฐ๋ฅด์ง€ ์•Š๊ณ  ๋…๋ฆฝ์ ์ธ ์‚ฌ๊ณ ๋ฅผ ํ•˜๋Š” ๊ฒƒ์˜ ์ค‘์š”์„ฑ์— ๋Œ€ํ•ด ์–ธ๊ธ‰ํ–ˆ์Šต๋‹ˆ๋‹ค.

์–‘๋ณด๋‹ค ์งˆ: Naval์€ ํ”ผ์ƒ์ ์ธ ์šฐ์ •์„ ๋งŽ์ด ์Œ“๊ธฐ๋ณด๋‹ค๋Š” ์†Œ์ˆ˜์˜ ์นœํ•œ ์นœ๊ตฌ์™€ ๊นŠ๊ณ  ์˜๋ฏธ ์žˆ๋Š” ๊ด€๊ณ„๋ฅผ ํ˜•์„ฑํ•˜๋Š” ๋ฐ ์ง‘์ค‘ํ•  ๊ฒƒ์„ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. .
LLM์„ RAG ๋ฐฉ์‹์œผ๋กœ ์จ๋ณด์‹œ๋Š” ์‹œ๋„๋ฅผ ๋งŽ์ด ํ•˜์‹คํ…๋ฐ์š” .
๊ด€๋ จํ•˜์—ฌ ์Šคํƒ ํฌ๋“œ์—์„œ LLM์— ๋„ฃ๋Š” input๋“ค์˜ ์ƒ๋Œ€์  ์œ„์น˜์™€ ๊ด€๋ จํ•ด ๋‹ค์–‘ํ•œ ์‹คํ—˜์„ ํ•œ ํฅ๋ฏธ๋กœ์šด ๋…ผ๋ฌธ์ด ์žˆ์–ด ์ •๋ฆฌํ•ด๋ดค์Šต๋‹ˆ๋‹ค.
๋‹น์—ฐํžˆ ์„œ๋น„์Šค ์ƒํ™ฉ๋งˆ๋‹ค, ์ด์šฉํ•˜๋Š” LLM๋งˆ๋‹ค ์ƒํ™ฉ์ด ๋‹ค๋ฅด์ง€๋งŒ
์‹คํ—˜ ๋ฒ”์œ„๋ฅผ ์ขํž ์ˆ˜ ์žˆ๋Š” ์ข‹์€ ๊ฐ€์ด๋“œ๊ฐ€ ๋˜๊ฒ ๋„ค์š”
(ํ˜น์‹œ ์ด๋Ÿฐ RAG ์‹ค์ „์— ๋„์›€๋˜๋Š” ๋‹ค๋ฅธ ๋…ผ๋ฌธ์„ ์•Œ๊ณ  ๊ณ„์‹œ๋‹ค๋ฉด ๊ณต์œ ํ•ด์ฃผ์‹œ๋ฉด ๊ฐ์‚ฌํ•˜๊ณ˜์Šต๋‹ˆ๋‹ค! ใ…Žใ…Ž )
<์งง์€ (์ œ ๋ง˜๋Œ€๋กœ) ๊ฒฐ๋ก >

RAG ์ ‘๊ทผ์—์„œ๋Š” ranked list truncation ์ „๋žต, ์ฆ‰ ์ˆœ์œ„๊ฐ€ ๋†’์€ document๋ฅผ ์•ž์— ๋„ฃ์–ด์ฃผ๊ณ  ์ผ์ • ์ˆ˜์ค€ ์ดํ•˜ document๋Š” ๋„ฃ์–ด์ฃผ์ง€ ์•Š๋Š” ์ „๋žต์„ ์“ฐ์ž
๋งŒ์•ฝ ์œ„ 1์˜ ์ ‘๊ทผ์ด ์‰ฝ์ง€ ์•Š์€ ๊ฒฝ์šฐ, ์ฆ‰ ์ค‘์š”ํ•œ ๋ถ€๋ถ„์„ ๊ณจ๋ผ๋‚ด๊ธฐ ์–ด๋ ค์šด ๊ฒฝ์šฐ๋Š” GPT-3.5๋ง๊ณ  Claude๋ฅผ ๊ฒ€ํ† ํ•ด ๋ณผ๋ฒ•ํ•˜๋‹ค.
๊ทธ ์ด์œ ๋Š” ๋‹จ์ˆœํžˆ max context length๊ฐ€ ๊ธธ์–ด์„œ๋ผ๊ธฐ ๋ณด๋‹ค๋Š”,
์ค‘์š” ์ •๋ณด ์œ„์น˜์— ๋”ฐ๋ฅธ ๊ฒฐ๊ณผ ํŽธ์ฐจ๊ฐ€ GPT์— ๋น„ํ•ด ์ ์–ด์„œ

decoder-only ๋ชจ๋ธ์„ ์“ฐ๋Š” ๊ฒฝ์šฐ(์˜ˆ: gpt-3.5-turbo)์—๋Š” Query-aware contextualization ๋ฐฉ๋ฒ•, ์ฆ‰ ํ”„๋กฌํ”„ํŠธ์— documents ์ „ํ›„์— query ํ…์ŠคํŠธ๋ฅผ ์ค‘๋ณตํ•ด์„œ ๋„ฃ์–ด์ฃผ๋Š” ๋ฐฉ๋ฒ•์„ ์‹œ๋„ํ•ด๋ณด์ž
-------------
<๋…ผ๋ฌธ ๋‚ด์šฉ ์ •๋ฆฌ>

์ค‘์š”ํ•œ ์ •๋ณด๊ฐ€ ์•ž ๋˜๋Š” ๋’ค์ชฝ์— ์žˆ์„ ๋•Œ ์„ฑ๋Šฅ์ด ๋†’์•„์ง„๋‹ค.
โ€ข [์ฒซ๋ฒˆ์งธ ๊ทธ๋ฆผ] ์ •๋‹ต์„ ํฌํ•จํ•œ passage์˜ ์ƒ๋Œ€์  ์œ„์น˜๋ฅผ ๋ฐ”๊ฟ”๊ฐ€๋ฉด์„œ ์‹คํ—˜ํ–ˆ๋”๋‹ˆ ์•ž ๋˜๋Š” ๋’ค์ชฝ์— ์žˆ์„๋•Œ์™€ ์ค‘๊ฐ„์— ์žˆ์„ ๋•Œ (์‹คํ—˜ ์ƒ) ์ •ํ™•๋„๊ฐ€ ๋งŽ์ด ์ฐจ์ด๋‚จ
โ€ข ์‹ฌ์ง€์–ด GPT-3.5์˜ ๊ฒฝ์šฐ๋Š”, ์ค‘์š” ์ •๋ณด๊ฐ€ ์ค‘๊ฐ„์— ์žˆ๋Š” ๊ฒฝ์šฐ์—, ์•„๋ฌด๋Ÿฐ context๋ฅผ ์ฃผ์ง€ ์•Š๊ณ  ์‹คํ–‰ํ•œ ๊ฒฝ์šฐ๋ณด๋‹ค ๋” ๋‚ฎ์€ ์ˆ˜์น˜๊ฐ€ ๋‚˜์˜ค๊ธฐ๋„โ€ฆใ„ทใ„ท
โ€ข ์ด๋Ÿฌํ•œ U-shape ๊ฒฐ๊ณผ๋Š” instcruct ๋„ฃ์–ด์ฃผ๋“  ์•ˆ๋„ฃ์–ด์ฃผ๋“  ์œ ์‚ฌํ•˜๊ฒŒ ๋‚˜ํƒ€๋‚˜๋ฉฐ, encoder-decoder ๋ชจ๋ธ๋ณด๋‹ค decoder-only ๋ชจ๋ธ์—์„œ ๋” ์‹ฌํ•˜๊ฒŒ ๋‚˜ํƒ€๋‚จ.
(๊ทผ๋ฐ ๊ฒฝํ–ฅ์„ฑ์ด ์ €๋ ‡๋‹ค๋Š”๊ฑฐ์ง€ ๊ฒฐ๊ตญ ์ค‘์š”ํ•œ๊ฑด ๋ชจ๋ธ ๋ฐ”์ด ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค.
์˜ˆ๋ฅผ ๋“ค์–ด claude์˜ ๊ธธ์ด์— ๋”ฐ๋ฅธ ์„ฑ๋Šฅ ํŽธ์ฐจ๋Š” gpt-3.5์— ๋น„ํ•ด ํ›จ์”ฌ ์ ์Šต๋‹ˆ๋‹ค.)
โ€ข [๋‘๋ฒˆ์งธ ๊ทธ๋ฆผ] ์ด ํ˜„์ƒ์€ supervised instruction-tuning์„ ํ•œ ๋ชจ๋ธ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ํ•˜์ง€ ์•Š์€ ๋ชจ๋ธ์—๋„ ์œ ์‚ฌํ•œ ํ˜„์ƒ์ด ๋‚˜ํƒ€๋‚จ.
์ฆ‰ instruction-tuningํ•  ๋•Œ instruct๊ฐ€ ์•ž์ชฝ์— ์œ„์น˜ํ•ด์„œ ์ด๋Ÿฐ ํ˜„์ƒ์ด ๋‚˜ํƒ€๋‚˜๋Š”๊ฑด ์•„๋‹Œ๊ฒƒ์œผ๋กœ ๋ณด์ž„

decoder-only ๋ชจ๋ธ์—์„œ Query-aware contextualization ๋ฐฉ๋ฒ•์ด ํšจ๊ณผ ์žˆ๋‹ค.
โ€ข Query-aware contextualization ๋ž€ ํ”„๋กฌํ”„ํŠธ์—์„œ Query ํ…์ŠคํŠธ๋ฅผ documents ์ „ํ›„๋กœ ๋„ฃ์–ด์ฃผ๋Š” ๋ฐฉ๋ฒ•
โ€ข ์ฆ‰ 1๋ฒˆ ๊ฒฐ๊ณผ์—์„œ ๋ณด๋“ฏ ์ •๋ณด์˜ ์œ„์น˜๊ฐ€ ํผํฌ๋จผ์Šค์— ์˜ํ–ฅ์„ ๋งŽ์ด ๋ผ์น˜๋Š” decoder-only ๋ชจ๋ธ์˜ ๊ฒฝ์šฐ์—๋Š” query ์œ„์น˜๋„ ์ค‘์š”ํ•˜๋‹จ ์–˜๊ธฐ์ž„
โ€ข multi-document QA์—์„œ๋Š” ์ƒ๋Œ€์ ์œผ๋กœ ์ด ๊ธฐ๋ฒ•์˜ ํšจ๊ณผ๋Š” ์ ์—ˆ์ง€๋งŒ, ์ด ๋•Œ๋„ ์ค‘์š” document์˜ ์ƒ๋Œ€์  ์œ„์น˜์— ๋”ฐ๋ผ ์˜ํ–ฅ ์ •๋„๊ฐ€ ๋‹ฌ๋ผ์ง
(์ค‘์š” document๊ฐ€ ์•ž์ชฝ์— ์œ„์น˜ํ•˜๋Š” ๊ฒฝ์šฐ๋Š” multi-document QA์—์„œ๋„ ์ด ๋ฐฉ๋ฒ•์ด ์œ ํšจํ–ˆ์Œ)

Context๋ฅผ ๋งŽ์ด ๋„ฃ์–ด์ค„์ˆ˜๋ก ํ•ญ์ƒ ์ข‹์€๊ฐ€? -> ๋‹น์—ฐํžˆ ์•„๋‹ˆ๋‹ค
โ€ข ์ผ์ • ์ˆ˜์ค€์˜ context๊ฐ€ ์ด๋ฏธ ์žˆ๋Š” ์ƒํ™ฉ์—์„œ๋Š” ๊ด€๋ จ ์žˆ๋Š” ๋‚ด์šฉ์ด๋ผ๋„ ๋” ๋„ฃ์–ด๋ดค์ž ์„ฑ๋Šฅ ํ–ฅ์ƒ์ด ๋ฏธ๋ฏธ
โ€ข [์„ธ๋ฒˆ์งธ ๊ทธ๋ฆผ] ๊ด€๋ จ ์—†๋Š”๊ฑธ ๋„ฃ์œผ๋ฉด ๋‹น์—ฐํ•˜๊ฒŒ๋„ ๋–จ์–ด์ง€๋Š” ๊ฒฝํ–ฅ์„ ๋ณด์ž„
โ€ข ๋˜ํ•œ Extended-context ๋ชจ๋ธ๊ณผ ์› ๋ชจ๋ธ ๋น„๊ตํ—€์„ ๋•Œ ๋™์ผ ์ธํ’‹์— ๋Œ€ํ•ด์„œ๋Š” ์„ฑ๋Šฅ ์ฐจ์ด ๊ฑฐ์˜ ์—†๋‹ค
(์—ฌ๊ธฐ์„œ๋Š” GPT-3.5-Turbo and GPT-3.5-Turbo (16K) ๋ฅผ ๋น„๊ตํ–ˆ์Šต๋‹ˆ๋‹ค.
16k ๊ฐ€๊ฒฉ์ด ๋‘๋ฐฐ๊ณ , ๋งŽ์ด ๋„ฃ์–ด๋ดค์ž ๋„์›€์ด ์•ˆ๋˜๋‹ˆ ์ตœ๋Œ€ํ•œ 8k ๋ชจ๋ธ์— ๋งž์ถฐ์„œ ์ด์šฉํ•˜๋Š”๊ฒŒ ์ด๋“)
์ฐธ๊ณ : ์‹คํ—˜ํ™˜๊ฒฝ
Task: multi-document QA, key-value retrieval
Model: OpenAIโ€™s GPT-3.5-Turbo, Anthropicโ€™s Claud, MPT-30B-Instruct, LongChat-13B(16K)
https://www.linkedin.com/posts/philipp-schmid-a6a2bb196_are-vector-databases-here-to-stay-yes-activity-7085908435686285312-QVfB/?utm_source=share&utm_medium=member_android
<์˜์‚ฌ๊ฒฐ์ •์˜ ์ˆœ์„œ>
์ „๋ฌธ๊ฐ€์™€ ๋น„์ „๋ฌธ๊ฐ€์˜ ์ฐจ์ด๊ฐ€ ํฌ๊ฒŒ ๋„๋“œ๋ผ์ง€๋Š” ๋ถ€๋ถ„ ์ค‘ ํ•˜๋‚˜๋Š” ์ผ์˜ ์ˆœ์„œ์ด๋‹ค. ํŠนํžˆ ์ธ์ง€์  ๊ณผ์—…์ผ ๊ฒฝ์šฐ ์˜์‚ฌ๊ฒฐ์ •์˜ ์ˆœ์„œ๊ฐ€ ์ „๋ฌธ๊ฐ€์™€ ๋น„์ „๋ฌธ๊ฐ€๋ฅผ ๊ฐ€๋ฅธ๋‹ค. ์–ด๋–ค ์ˆœ์„œ๋กœ ์˜์‚ฌ๊ฒฐ์ •์„ ํ•˜๋ฉด ์ผ์„ ํ•˜๊ธฐ๊ฐ€ ํ›จ์”ฌ ์ˆ˜์›”ํ•œ๋ฐ, ๊ทธ๋ ‡์ง€ ์•Š์œผ๋ฉด ์ผํ•˜๋ฉด์„œ ํ—ค๋งค๊ฒŒ ๋˜๊ณ  ์–ด๋ ค์›€์„ ๊ฒช๊ฒŒ ๋œ๋‹ค.
ํ•ด์•ผํ•  ์˜์‚ฌ๊ฒฐ์ •๋“ค์ด ๋งŽ์„ ๋•Œ ์ž์‹ ์—๊ฒŒ ๋ฌผ์–ด์•ผํ•  ์งˆ๋ฌธ์€ ์ด๋ ‡๋‹ค. "์–ด๋А ์˜์‚ฌ๊ฒฐ์ •์„ ๋จผ์ € ํ•˜๋ฉด ๋‹ค๋ฅธ ์˜์‚ฌ๊ฒฐ์ •๋“ค์„ ๋‚ด๋ฆฌ๊ธฐ๊ฐ€ ํ›จ์”ฌ ์ˆ˜์›”ํ•ด์งˆ๊นŒ?"
์ด๊ฒƒ์ด ๋ฐ”๋กœ ๊ฑด์ถ•ํ•™์ž ํฌ๋ฆฌ์Šคํ† ํผ ์•Œ๋ ‰์‚ฐ๋”๊ฐ€ ๋งํ•˜๋Š” ์‹œํ€€์Šค(sequence)๋ผ๋Š” ๊ฐœ๋…์ด๋‹ค.
๊ฑด์ถ•ํ•™์ž์ธ ํฌ๋ฆฌ์Šคํ† ํผ ์•Œ๋ ‰์‚ฐ๋”์— ๋”ฐ๋ฅด๋ฉด ์ž‘์€ ์ง‘ ํ•˜๋‚˜๋ฅผ ์ง“๋Š”๋ฐ์—๋งŒ ํ•ด์•ผํ•  ์˜์‚ฌ๊ฒฐ์ •์ด ์ˆ˜๋ฐฑ๊ฐœ์— ๋‹ฌํ•œ๋‹ค๊ณ  ํ•œ๋‹ค. ๊ทธ๋Ÿฌ๋ฉด ๊ฐ€๋Šฅํ•œ ์˜์‚ฌ๊ฒฐ์ •์˜ ์ˆœ์„œ๋Š” ์ฒœ๋ฌธํ•™์ ์œผ๋กœ ๋Š˜์–ด๋‚˜๊ฒŒ ๋œ๋‹ค. ํ•˜์ง€๋งŒ ๊ฐ€์žฅ ํ•ต์‹ฌ์ ์ธ(center) ์˜์‚ฌ๊ฒฐ์ •๋ถ€ํ„ฐ ํ•˜๋‚˜์”ฉ ์ง„ํ–‰ํ•˜๊ฒŒ ๋˜๋ฉด ์ ์  ์ผ์ด ์ •๋ˆ๋˜๊ณ  ์ˆ˜์›”ํ•˜๊ฒŒ ํ’€๋ฆฌ๊ฒŒ ๋œ๋‹ค. ํ–ˆ๋˜ ๊ฑธ ๋‹ค์‹œ ๋˜๋Œ๋ ค์•ผ ํ•  ํ™•๋ฅ ๋„ ํ™• ์ค„์–ด๋“ ๋‹ค.
์˜ˆ๋ฅผ ๋“ค์–ด, ๊ทธ๋Š” ๋ถ€์—Œ์„ ๋งŒ๋“ค ๋•Œ์— ๊ฐ€์žฅ ์šฐ์„ ์ ์œผ๋กœ ํ•  ์˜์‚ฌ๊ฒฐ์ • ์ค‘ ํ•˜๋‚˜๋กœ ์‹ํƒ์˜ ์œ„์น˜ ์ •ํ•˜๊ธฐ๋ฅผ ๊ผฝ๋Š”๋‹ค. ์™œ๋ƒํ•˜๋ฉด ๊ฒฐ๊ตญ (์„œ์–‘์‹) ๋ถ€์—Œ์˜ ๊ฐ€์žฅ ํ•ต์‹ฌ์ด ๋˜๋Š” ํ™œ๋™์€ ๊ฐ€์กฑ์ด ํ•จ๊ป˜ ์‹์‚ฌ๋ฅผ ํ•˜๋Š” ๊ฒƒ์ด๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค. ์ด๊ฒŒ ์ •ํ•ด์ง€๋ฉด ๋‹ค๋ฅธ ๋ชจ๋“  ๊ฒƒ๋“ค์€ ๋ถ€์ฐจ์ ์ธ ์˜์‚ฌ๊ฒฐ์ •์ด ๋˜์–ด ๊ต‰์žฅํžˆ ์‰ฝ๊ฒŒ ํ’€๋ ค๋ฒ„๋ฆฐ๋‹ค. ๊ทผ๋ฐ, ์ด๊ฑธ ์ •ํ•˜์ง€ ์•Š์€ ์ƒํƒœ์—์„œ ๋ƒ‰์žฅ๊ณ ๋ฅผ ์–ด๋””์— ๋‘˜์ง€ ์ •ํ•˜๋Š” ๊ฒƒ์€ ๊ธฐ์ค€์ด ๋ช…ํ™•ํ•˜์ง€ ์•Š์•„ ์–ด๋ ค์›€์ด ๋งŽ์„ ๊ฒƒ์ด๋‹ค.
๊ฑด์ถ•๋ฟ๋งŒ์ด ์•„๋‹ˆ๋‹ค. ์ธ์ง€์  ์ž‘์—…์„ ํ•˜๋Š” ๊ฒฝ์šฐ ๋ชจ๋‘๊ฐ€ ํ•ด๋‹นํ•œ๋‹ค. ๋‚ด๊ฐ€ ์˜์‚ฌ๊ฒฐ์ •์„ ๋‚ด๋ ค์•ผ ํ•˜๋Š”๋ฐ ๊ฐ๋„ ์ž˜ ์•ˆ์˜ค๊ณ  ๋ชจํ˜ธํ•˜๋‹ค๋ฉด ์ˆœ์„œ๊ฐ€ ์ž˜๋ชป๋œ ๊ฑฐ ์•„๋‹Œ๊ฐ€ ์ƒ๊ฐํ•ด ๋ณด๋Š” ๊ฒƒ์ด ๋„์›€์ด ๋  ๊ฒƒ์ด๋‹ค.
2. <์ƒ์„ฑ์  ์ˆœ์„œ>

๋‹ค์Œ์€ ๊ฑด์ถ•๊ฐ€ Christopher Alexander์˜ ์—ญ์ž‘ Nature of Order 2๊ถŒ(p. 317)์— ๋‚˜์˜ค๋Š” ์ผํ™”๋ฅผ ๋‚ด๊ฐ€ ๋ฒˆ์—ญํ•˜๊ณ (๊ธ‰ํ•˜๊ฒŒ ํ•˜๋А๋ผ ์ข€ ๊ฑฐ์น ๋‹ค) ํŽธ์ง‘ํ•œ ๊ฒƒ์ด๋‹ค. ์—ฌ๊ธฐ์— ์• ์ž์ผ์˜ ์˜ค์˜ๅฅง็พฉ๊ฐ€ ์ˆจ์–ด ์žˆ๋‹ค. ์‚ฌ์‹ค ์ด๋Ÿฐ ๊ฒŒ ํšŒ๊ณ ๋‹ˆ ์Šคํฌ๋Ÿผ ๋ฏธํŒ…์ด๋‹ˆ ํ•˜๋Š” ๊ฒƒ๋ณด๋‹ค ์˜ค๋ฐฑ๋ฐฐ ๋” ์ค‘์š”ํ•˜๋‹ค.

- ---

๊ฑด์ถ•์€ ๋ณต์žกํ•˜๊ธฐ ๋•Œ๋ฌธ์— ๋””์ž์ด๋„ˆ๋Š” ๋ชจ๋“  ๊ฑธ ํ•œ๋ฒˆ์— ํ•ด์•ผํ•œ๋‹ค๋Š” ์ด์•ผ๊ธฐ๊ฐ€ ํผ์ ธ์žˆ๋‹ค. "๋ณต์žกํ•œ ์ „์ฒด๊ฐ€, ๋ถ€๋ถ„์œผ๋กœ ๋‚˜๋‰˜์ง€ ์•Š๋Š” ํ•˜๋‚˜๋กœ ์กด์žฌํ•˜๋Š”๋ฐ ์–ด๋–ป๊ฒŒ ํ•œ ๋ฒˆ์— ํ•˜๋‚˜์”ฉ ์ฒ˜๋ฆฌํ•˜๋ฉด์„œ ์„ค๊ณ„๋ฅผ ์„ฑ๊ณต์ ์œผ๋กœ ํ•  ์ˆ˜ ์žˆ๊ฒ ์–ด์š”?" ๊ฐ™์€ ์งˆ๋ฌธ์„ ๋ณด๋ฉด ์•Œ ์ˆ˜ ์žˆ๋‹ค.

๊ฑด์ถ•์„ ๊ณต๋ถ€ํ•˜๋Š” ํ•™์ƒ ํ•˜๋‚˜๋ฅผ ๊ฐ€๋ฅด์น˜๋ ค๊ณ  ๋‚ด๊ฐ€ ์—ฌ๋Ÿฌ๋‹ฌ ๋…ธ๋ ฅ์„ ํ–ˆ๋‹ค. ๊ทธ๋Š” ๋›ฐ์–ด๋‚œ ํ•™์ƒ์ด์—ˆ์ง€๋งŒ ์„ค๊ณ„์— ๋Œ€ํ•ด ์ทจ์•ฝํ–ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‹ค๊ฐ€ ๊ทธ ์นœ๊ตฌ๊ฐ€ ์ง‘ ํ•˜๋‚˜๋ฅผ ์„ค๊ณ„ํ•ด์•ผ ํ•˜๋Š” ๋””์ž์ธ ์ˆ˜์—…์— ๊ฐ™์ด ์ฐธ์—ฌํ•˜๊ฒŒ ๋˜์—ˆ๋‹ค. ๊ทธ ํ•™์ƒ์€ ์ œ๋Œ€๋กœ ์„ค๊ณ„๋ฅผ ํ•˜๋ ค๊ณ  ์•„๋ฌด๋ฆฌ ๋…ธ๋ ฅํ•ด๋„ ์ž˜ ๋˜์ง€ ์•Š์•˜๋‹ค. ๊ทธ์˜ ๋“œ๋กœ์ž‰ ๋ณด๋“œ ์œ„์—๋Š” ๋‚œ์žกํ•œ ํ”์ ์ด ๊ฐ€๋“ํ–ˆ๋‹ค.

๋ˆ„๊ตฐ๊ฐ€ ์„ค๊ณ„๋ฅผ ์ œ๋Œ€๋กœ ๋ชปํ•  ๋•Œ์—๋Š” ํ†ต์ƒ ์ž˜๋ชป๋œ ์ˆœ์„œ๋กœ ์ ‘๊ทผ์„ ํ•˜๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค. ๋ชจ๋“  ๊ฐ€๋Šฅํ•œ ์ด์Šˆ๋“ค ์‚ฌ์ด๋ฅผ ์˜ค๊ฐ€๋ฉด์„œ ๊ณ„์† ํ˜ผ๋ˆ ์†์— ๋น ์ง€๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค.

๊ทธ๋Ÿฌ๋‹ค๊ฐ€ ์–ด๋А๋‚  ๊ทธ ํ•™์ƒ์ด๋ž‘ ๊ฐ™์ด ์•‰์•„์„œ ๋Œ€ํ™”๋ฅผ ํ•˜๊ฒŒ ๋๋‹ค.

"์˜ค๋Š˜ ๋‹น์‹ ์˜ ์„ค๊ณ„๋ฅผ ์ €๋ž‘ ๋Œ€ํ™”ํ•˜๋ฉด์„œ ํ’€์–ด๋ณด๋ฉด ์–ด๋–จ๊นŒ์š”. ์ผ๋‹จ ์ด๋ฏธ ๊ฐ–๊ณ  ์žˆ๋Š” ๊ฑด ์žŠ์œผ์„ธ์š”. ์ „์ฒด ์„ค๊ณ„๋ฅผ ๋‹ค ์ง€์›Œ๋ฒ„๋ฆฌ๊ณ ์š”. ์•„๋ฌด๊ฒƒ๋„ ์—†๋Š” ๊ฑฐ๋กœ ์‹œ์ž‘ํ•ฉ์‹œ๋‹ค."

"๊ทธ ํ˜„์žฅ(site)์—์„œ ๊ฐ€์žฅ ์ค‘์š”ํ•œ ๊ฑด ๋ญ๊ณ , ๋˜ ๊ทธ ํ˜„์žฅ๊ณผ ๊ด€๋ จํ•ด ๋‹น์‹ ์˜ ์„ค๊ณ„๊ฐ€ ํ•ด์•ผํ•  ๊ฐ€์žฅ ์ค‘์š”ํ•œ ๊ฒƒ์€ ๋ฌด์—‡์ธ๊ฐ€์š”? ๋‚˜๋จธ์ง€๋Š” ๊ฑฑ์ •ํ•˜์ง€ ๋งˆ์„ธ์š”. ์ด ์งˆ๋ฌธ ํ•˜๋‚˜์— ๋Œ€ํ•ด์„œ๋งŒ ์–˜๊ธฐํ•ด์ฃผ์„ธ์š”."

"์ข‹์•„์š”. ๊ทธ๋Ÿผ ํ‘œ์‹œ๋ฅผ ํ•ด๋ณด์ฃ . ๊ทธ๊ฑฐ ํ•˜๋‚˜๋งŒ ๋”ฑ ๋„ฃ์–ด๋ด์š”. ๋‚˜๋จธ์ง€๋Š” ๋ชจ๋‘ ์žŠ์–ด๋จน๊ณ ์š”."

"์ข‹์•„์š”. ์ด์ œ ๋‹ค์Œ์œผ๋กœ ๊ฐ€์žฅ ์ค‘์š”ํ•œ ๊ฒŒ ๋ญ”์ง€ ๋งํ•ด์ฃผ์„ธ์š”."

์ด๋ ‡๊ฒŒ ํ•œ์‹œ๊ฐ„ ๊ฐ€๋Ÿ‰ ์ง€์†ํ–ˆ๋‹ค. ํ•™์ƒ์ด ๋ญ”๊ฐ€๊ฐ€ ์ค‘์š”ํ•˜๋‹ค๊ณ  ํ–ˆ์ง€๋งŒ ๋‚ด๊ฐ€ ๋А๋ผ๊ธฐ์— ๊ทธ ํ•™์ƒ์ด ์‹ค์ œ๋กœ ๊ทธ๊ฑธ ์ค‘์š”ํ•˜๊ฒŒ ๋А๋‚€ ๊ฑด์ง€ ์˜์‹ฌ์ด ๊ฐ€๋ฉด, ๋‚˜๋Š” ๊ทธ ํ•™์ƒ์„ ์ณ๋‹ค๋ณด๋ฉด์„œ ์•„๋‡จ, ๋‹ค์Œ์œผ๋กœ ์ง„์งœ๋กœ ๊ฐ€์žฅ ์ค‘์š”ํ•œ ๊ฒŒ ๋ญ”๊ฐ€์š”? ๋ผ๊ณ  ๋ฌป๊ธฐ๋„ ํ–ˆ๋‹ค.

๊ทธ๋ ‡๊ฒŒ ํ•œ์‹œ๊ฐ„์ด ๋๋‚˜๊ฐˆ ๋•Œ ์ฏค์—๋Š” ๊ทธ๋Š” ์•„๋ฆ„๋‹ค์šด ๊ฑด๋ฌผ์„ ์™„์„ฑํ•  ์ˆ˜ ์žˆ์—ˆ๋‹ค.

๊ทธ ํ•™์ƒ์ด ๋‚˜์—๊ฒŒ ์™€์„œ ์ด๋ ‡๊ฒŒ ๋งํ–ˆ๋‹ค. "์ด๊ฒŒ ์ด๋ ‡๊ฒŒ ๊ฐ„๋‹จํ•˜๋ฆฌ๋ผ๊ณ ๋Š” ์ƒ๊ฐ๋„ ๋ชปํ–ˆ์–ด์š”. ์ „์—๋Š” ํ•œ๋ฒˆ๋„ ์ด๊ฑธ ์ดํ•ดํ•œ ์ ์ด ์—†์—ˆ์–ด์š”. ๊ทผ๋ฐ ์ด์ œ ๋งˆ์นจ๋‚ด ์ดํ•ด๋ฅผ ํ–ˆ์–ด์š”. ์„ ์ƒ๋‹˜์ด ์ด์ œ๊ป ์–˜๊ธฐํ•ด ์˜จ ๊ฒŒ ๋ฌด์Šจ ๋ง์ธ์ง€์š”. ์„ ์ƒ๋‹˜์€ ๊ทธ๋ƒฅ ํ•œ๋ฒˆ์— ๋”ฑ ํ•˜๋‚˜์”ฉ๋งŒ ๊ฐ–๊ณ  ํ•˜์‹œ๊ณ  ๋˜ ๊ทธ๊ฑธ ๋ฐ”๋ฅธ ์ˆœ์„œ๋กœ ํ•˜์‹œ๋Š” ๊ฑฐ์—์š”. ๊ทธ๊ฒŒ ์ „๋ถ€์ฃ . ๋‹จ์ง€ ๊ฐ€์žฅ ์ค‘์š”ํ•œ ๊ฑธ ํ•˜๋ผ๋Š” ๊ฑฐ. ๊ทธ๋ฆฌ๊ณ  ๊ทธ๊ฑธ ๋๋‚ด๋ผ๋Š” ๊ฑฐ."

์ด๊ฒŒ ์ž‘๋™ํ•˜๋Š” ์ด์œ ๋Š” ์ž์—ฐ์˜ ๋ชจ๋“  ์‹œ์Šคํ…œ๊ณผ ๊ฐ™์ด, ์‹ค์ œ๋กœ ํŽผ์ณ์ง€๋Š” ๊ณผ์ •์ด ๊ทธ ์ž์ œ๋กœ "์ „์ฒด"(whole)๋ผ๋Š” ๊ฒƒ ๋•Œ๋ฌธ์ด๋‹ค. ์ธ๊ฐ„์˜ ๋ชจ๋“  ํŽผ์ณ์ง€๋Š” ๊ณผ์ •์˜ ํ•ต์‹ฌ์€, ์˜ˆ์ˆ ๊ฐ€ ํ˜น์€ ๊ฑด์ถ•๊ฐ€๊ฐ€ ๋ฐ”๋กœ ์ฒซ๋‚ ๋ถ€ํ„ฐ ์ž๊ธฐ๊ฐ€ ๋งŒ๋“œ๋Š” ๊ฑธ ์ „์ฒด๋กœ, ์˜จ์ „ํ•œ ํ•˜๋‚˜์˜ ๊ฒƒ์œผ๋กœ ์‹œ๊ฐํ™”ํ•  ์ˆ˜ ์žˆ์–ด์•ผ ํ•œ๋‹ค๋Š” ๊ฑฐ๋‹ค. ์‹ฌ์ง€์–ด ๋‹น์‹ ์ด ์‹œ์ž‘ํ•˜๊ธฐ๋„ ์ „์—, ๊ทธ๊ฑธ ์„ ํƒํ•˜๊ณ  ๊ทธ๊ฒŒ ๋ฐ”๋กœ ์ „์ฒด์ธ ๊ฑฐ์ฒ˜๋Ÿผ ๋ด์•ผ ํ•œ๋‹ค. ๊ทธ๊ฑธ ์ „์ฒด๋กœ์„œ ๋А๋ผ๊ณ  ๊ทธ ์ „์ฒด์„ฑ ์†์—์„œ ๊ทธ ๋ถ€๋ถ„์„ ์ƒ์ƒํ•ด์•ผ ํ•˜๋ฉฐ... ์ ์ง„์ ์œผ๋กœ ๊ฑฐ๊ธฐ์—์„œ ์ด ์ „์ฒด์„ฑ์˜ ํŠน์„ฑ๋“ค์„ ๋Œ์–ด๋‚ด์•ผ ํ•œ๋‹ค.

๊น€์ฐฝ์ค€ ๋‹˜
#์Šคํƒ€ํŠธ์—… ์•„์ด๋””์–ด: LLM์„ ์œ„ํ•œ QA ํ”Œ๋žซํผ

LLM์„ ํ™œ์šฉํ•œ ์„œ๋น„์Šค๊ฐ€ ๋งŽ์•„์ง€๋ฉด, LLM์˜ Hallucination์„ ์žก๋Š” ๊ฒŒ ์ค‘์š”ํ•˜๊ณ  ์ด๋ฅผ ์œ„ํ•ด์„œ ์—ฌ๋Ÿฌ Iteration์„ ๋Œ๋ฉด์„œ ์—”์ง€๋‹ˆ์–ด๋ง์„ ํ•ด์•ผํ•˜๋Š”๋ฐ์š”.

์˜ˆ๋ฅผ๋“ค์–ด, ์ฑ—๋ด‡ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์„ ์‡ผํ•‘๋ชฐ์— ๋„์ž…ํ•˜๋ ค๊ณ  ํ•œ๋‹ค๋ฉด ์ด์ปค๋จธ์Šค ํ™˜๊ฒฝ์—์„œ ์ฑ—๋ด‡์˜ ์œ ์šฉ์„ฑ์€ ์ œํ’ˆ ํด๋ฆญ ์ˆ˜์™€ ๊ฐ™์€ ์ง€ํ‘œ์— ์—ฐ๊ฒฐ๋˜์–ด์•ผ ํ•˜๋ฉฐ, ์ด ์ง€ํ‘œ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ”ผ๋“œ๋ฐฑ ๋ฐ ๋ชจ๋ธ ์กฐ์ •์ด ์ด๋ฃจ์–ด์ ธ์•ผํ•ด์š”. ๊ทธ๋Ÿฐ๋ฐ ์ด๋Ÿฐ ๊ฑธ ํ•˜๊ธฐ์—” ๋˜๊ฒŒ ์–ด๋ ค์šด ์ ๋“ค์ด ๋งŽ์ฃ . ๊ทธ๋ฆฌ๊ณ  ์ด ์ฑ—๋ด‡ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์˜ ์„ฑ๋Šฅ์„ ๋†’์ด๊ธฐ ์œ„ํ•ด์„œ๋Š” ์—ฌ๋Ÿฌ๊ฐ€์ง€ ์š”์†Œ(๋ฐ์ดํ„ฐ ํ’ˆ์งˆ, ํ•™์Šต ๋ฐฉ์‹, ๋””๋ฒ„๊น… ๋ฐฉ์‹, ํ”„๋กฌํ”„ํŠธ ๋“ฑ์„ ๊ฐœ์„ ํ•ด์•ผํ•˜๋Š”๋ฐ์š”. ์ด๊ฑธ ์ž์ฒด์ ์œผ๋กœ ํ•˜๊ธฐ์—” ์ข€ ๋นก์„ธ๋ณด์ด๋„ค์š” ใ…Žใ…Ž

https://snorkel.ai/๋„ ์ด๋Ÿฐ ๋น„์ฆˆ๋‹ˆ์Šค ํ•ด๋ณด๊ณ  ์‹ถ์–ดํ•˜๋Š” ๊ฒƒ ๊ฐ™๊ณ ์š”.

https://dust.tt/๋Š” ML ์—”์ง€๋‹ˆ์–ด๊ฐ€ ์•„๋‹Œ ์‰ฌ์šด GUI๋กœ ๋ด‡์„ ๋งŒ๋“œ๋Š”๋ฐ์— ์ง‘์ค‘ํ•˜๋Š” ํŒ€์ธ ๊ฒƒ ๊ฐ™๊ณ ์š” ใ…Žใ…Ž
### What is Flash Attention?

FlashAttention is an algorithm designed to improve the performance, cost, and latency of attention mechanisms in transformer models, particularly in large language models (LLMs) and other transformer-based architectures**[1](https://www.quora.com/How-does-flash-attention-work)**. It can replace standard attention mechanisms in various applications, offering significant benefits in terms of speed, memory efficiency, and training costs**[1](https://www.quora.com/How-does-flash-attention-work)**. It has been widely adopted in large language model (LLM) libraries due to its significant speedup and memory efficiency**[1](https://crfm.stanford.edu/2023/07/17/flash2.html)[11](https://github.com/Dao-AILab/flash-attention)**.

### Model Performance

FlashAttention-2, the latest version, is 2x faster than its predecessor, FlashAttention, and 5-9x faster than standard attention implementations**[1](https://crfm.stanford.edu/2023/07/17/flash2.html)**. This speedup allows for training models with twice as long context for the same training cost as before**[1](https://crfm.stanford.edu/2023/07/17/flash2.html)**. When used end-to-end to train GPT-style language models, FlashAttention-2 reaches a training speed of up to 225 TFLOPs/s**[1](https://crfm.stanford.edu/2023/07/17/flash2.html)[5](https://twitter.com/_akhaliq/status/1680988185776607237)**.

### Cost and Latency

FlashAttention-2 addresses the inefficiencies of its predecessor by reformulating the algorithm to reduce non-matmul FLOPs, improve parallelism, and optimize work partitioning between thread blocks and warps on GPUs**[1](https://crfm.stanford.edu/2023/07/17/flash2.html)**. Non-matmul FLOPs are more expensive than matmul FLOPs, so these improvements lead to significant speedup and reduced latency**[1](https://crfm.stanford.edu/2023/07/17/flash2.html)**.The algorithm also minimizes communication and synchronization between warps, further improving performance**[1](https://crfm.stanford.edu/2023/07/17/flash2.html)**. FlashAttention-2 is more memory-efficient than exact attention baselines, with memory usage linear in sequence length rather than quadratic**[3](https://arxiv.org/pdf/2205.14135.pdf)[16](https://www.adept.ai/blog/flashier-attention)**. This allows it to scale to much longer sequence lengths, enabling the training of higher-quality models**[2](https://arxiv.org/abs/2205.14135)**.

### What is better?

Incumbents, or standard attention mechanisms, have quadratic memory usage and slower training times due to their computation and memory access patterns**[4](https://crfm.stanford.edu/2023/01/13/flashattention.html)**. FlashAttention, on the other hand, offers the following advantages:

1. Faster training times: FlashAttention-2, the latest version, is 2x faster than its predecessor, FlashAttention, and 5-9x faster than standard attention implementations**[1](https://www.quora.com/How-does-flash-attention-work)**. This speedup allows for faster training times and longer context lengths for the same training cost as before**[1](https://www.quora.com/How-does-flash-attention-work)**.
2. Longer context lengths(Memory efficiency): FlashAttention reduces memory usage from quadratic to linear in sequence length by leveraging tiling and recomputation techniques**[4](https://crfm.stanford.edu/2023/01/13/flashattention.html)**. This allows it to scale to much longer sequence lengths, enabling the training of higher-quality models**[2](https://github.com/HazyResearch/flash-attention)**.
3. Reduced training costs: The improvements in training speed and memory efficiency can result in reduced training costs**[1](https://www.quora.com/How-does-flash-attention-work)**.

### How is it possible?

The improvements in FlashAttention-2 were made by:
1. Tweaking the algorithm to reduce the number of non-matmul FLOPs**[1](https://github.com/HazyResearch/flash-attention)**.
2. Parallelizing the attention computation, even for a single head, across different thread blocks to increase occupancy**[1](https://github.com/HazyResearch/flash-attention)**.
3. Distributing the work between warps within each thread block to reduce communication through shared memory**[1](https://github.com/HazyResearch/flash-attention)**.

It achieves this by exploiting the asymmetric GPU memory hierarchy, bringing significant memory savings (linear instead of quadratic) and runtime speedup**[4](https://arxiv.org/abs/2205.14135)**. Additionally, FlashAttention-2 minimizes communication and synchronization between warps, further improving performance**[1](https://github.com/HazyResearch/flash-attention)**.

### Implication for industry players
1. Chipmakers: FlashAttention reduces the computational and memory requirements for attention mechanisms in transformer models, driving the demand for more efficient and specialized hardware to further optimize the performance of these models**[11](https://www.ft.com/content/8352e84b-284c-4ebe-a7c1-5e2093566e0d)**. This could lead to innovations in GPU architectures and the development of specialized AI accelerators.

This could lead to innovations in GPU architectures and the development of specialized AI accelerators that are better suited for handling the reduced computational and memory requirements of attention mechanisms in transformer models.Some components that could be affected include:

1. Memory hierarchy: FlashAttention exploits the asymmetric GPU memory hierarchy, which could lead to the development of new memory architectures that further optimize memory access patterns for attention mechanisms**[4](https://openreview.net/forum?id=H4DqfPSibmx)**.
2. Parallelism: FlashAttention improves parallelism in attention computation, which could influence the design of GPU architectures and AI accelerators to better support parallel processing for transformer models**[1](https://arxiv.org/pdf/2205.14135.pdf)**.
3. Communication and synchronization: FlashAttention reduces communication and synchronization between warps, which could impact the design of interconnects and synchronization mechanisms in GPU and AI accelerator architectures**[1](https://arxiv.org/pdf/2205.14135.pdf)**.
2. LLM makers: FlashAttention can help improve the efficiency of large language models (LLMs) by speeding up training times, allowing for longer context lengths, and reducing training costs**[1](https://arxiv.org/abs/2205.14135)**. This can lead to the development of more powerful LLMs and the creation of new AI services based on these models.
- By speeding up the attention mechanism and reducing memory requirements, FlashAttention allows for longer context lengths during training, which can lead to better model performance**[1](https://ahmdtaha.medium.com/flashattention-fast-and-memory-efficient-exact-attention-with-io-awareness-2a0aec52ed3d)**. This efficiency enables LLM makers to train more powerful models without sacrificing quality, as FlashAttention computes exact attention without any approximation**[3](https://arxiv.org/pdf/2205.14135.pdf)**.
- FlashAttention can also help reduce the cost for training, scaling, deploying, or fine-tuning LLMs by offering faster training times, longer context lengths, and reduced training costs**[1](https://ahmdtaha.medium.com/flashattention-fast-and-memory-efficient-exact-attention-with-io-awareness-2a0aec52ed3d)**. This is achieved through its improved memory efficiency, which allows it to scale to much longer sequence lengths, and its faster training times compared to standard attention mechanisms**[1](https://ahmdtaha.medium.com/flashattention-fast-and-memory-efficient-exact-attention-with-io-awareness-2a0aec52ed3d)**.
- FlashAttention could affect the open-source ecosystem by providing an efficient alternative to standard attention mechanisms. Its open-source implementation**[11](https://aws.amazon.com/blogs/machine-learning/new-performance-improvements-in-amazon-sagemaker-model-parallel-library/)** can be integrated into various open-source libraries and frameworks, leading to wider adoption and further development of the algorithm. This can drive innovation and efficiency across various industries and players, leading to the development of more powerful AI models and services.
3. Infrastructure builders for LLMs (e.g., Mosaic ML): By integrating FlashAttention into their infrastructure offerings, these companies can enable more efficient and cost-effective training and deployment of LLMs**[15](https://www.mosaicml.com/blog/mpt-7b)**. This can lead to wider adoption of AI technologies and more advanced AI services.
4. LLM-using services (e.g., Perplexity): Services that rely on LLMs, such as natural language processing, machine translation, text summarization, and sentiment analysis, can benefit from the improved performance and efficiency provided by FlashAttention**[6](https://www.wsj.com/articles/memory-chip-makers-struggle-with-decline-in-demand-and-price-falls-11665141235)**. Faster training times and longer context lengths can lead to better performance in these tasks, enabling the development of more advanced AI services.
LLM(Large Language Model) ํ•™์Šต ๋ฐ ์„œ๋น„์Šค ์šด์˜์— ๊ด€์‹ฌ์ด ๋งŽ์€ AI ์—”์ง€๋‹ˆ์–ด ๋ถ„๋“ค์„ ์ดˆ๋Œ€ํ•ฉ๋‹ˆ๋‹ค.
LLM ๋ชจ๋ธ์„ ์‹ค์ œ๋กœ ํ•™์Šต์‹œํ‚ค๊ณ  ์„œ๋น„์Šค๋ฅผ ์ œ๊ณตํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ๋งŽ์€ ์–ด๋ ค์›€๋“ค์ด ์กด์žฌํ•ฉ๋‹ˆ๋‹ค. ์‹ค์ œ๋กœ LLM ํ•™์Šต์‹œ์ผœ ์„œ๋น„์Šค๋ฅผ ๋งŒ๋“ค์–ด๋ณธ ๊ฒฝํ—˜๋“ค์„ ๋‚˜๋ˆ ๋ด์š” ๐Ÿ™‚

์„ธ์…˜ ์ฐธ๊ฐ€ ์‹ ์ฒญํ•˜๊ธฐ: https://lu.ma/agitownjuly2

ํ•ด๋‹น ์„ธ์…˜์—์„œ๋Š” ์•„๋ž˜ ๋‚ด์šฉ์— ๋Œ€ํ•ด์„œ ๋‹ค๋ฃฐ ์˜ˆ์ •์ž…๋‹ˆ๋‹ค.

1๏ธโƒฃ ์ž์ฒด LLM ํ•™์Šต์˜ ์žฅ๋‹จ์ : LLM ์ง์ ‘ ํ•™์Šต์‹œ์ผœ์•ผํ• ๊นŒ์š” ์•„๋‹ˆ๋ฉด Third Party ์†”๋ฃจ์…˜์„ ์จ์•ผํ• ๊นŒ์š”? ์ง์ ‘ ํ•™์Šตํ•˜๊ฒŒ ๋˜๋ฉด ํŠน์ • ์š”๊ตฌ์‚ฌํ•ญ์— ๋งž๊ฒŒ ๋ชจ๋ธ์„ ์ˆ˜์ •ํ•˜๊ณ , ๋ชจ๋ธ ๋””๋ฒ„๊น…์ด ํŽธํ•˜์ง€๋งŒ, ๋†’์€ ํ•™์Šต๋น„์šฉ๊ณผ ๊ณ ํ’ˆ์งˆ์˜ ๋ฐ์ดํ„ฐ ๊ฐ€๊ณต ๋“ฑ ์‹ ๊ฒฝ์จ์•ผํ•  ๋ถ€๋ถ„์ด ๋งŽ์Šต๋‹ˆ๋‹ค.

2๏ธโƒฃ Quality, Cost, Latency ๊ฐ„์˜ ํŠธ๋ ˆ์ด๋“œ์˜คํ”„: ๋น„์šฉ ์ง‘์•ฝ์ ์ธ GPU ์ถ”๋ก ๋ถ€ํ„ฐ ๋น„์šฉ ๊ด€๋ฆฌ๋ฅผ ์œ„ํ•œ ์—”์ง€๋‹ˆ์–ด๋ง์˜ ๊ธฐ๋ณธ์— ์ด๋ฅด๊ธฐ๊นŒ์ง€ ์„ฑ๋Šฅ, ๋น„์šฉ, ์‹œ๊ฐ„ ๊ฐ„์˜ ๊ท ํ˜•์„ ๋งž์ถ”๋Š” ๋ฐฉ๋ฒ•์„ ์‚ดํŽด๋ด…๋‹ˆ๋‹ค.

3๏ธโƒฃ ์‹ ๋ขฐ, ์•ˆ์ •์„ฑ ๋ฐ ๊ฐœ์ธ ์ •๋ณด ๋ณดํ˜ธ: LLM์—์„œ ํ”ํžˆ ๋ฐœ์ƒํ•˜๋Š” 'Hallucination'์˜ ํ•จ์ •, ์ข‹์€ ๋ฐ์ดํ„ฐ๋ฅผ ์–ป๊ธฐ ์œ„ํ•œ ๊ธฐ์ˆ , ๊ฐœ์ธ ์ •๋ณด๋ฅผ ๋ณดํ˜ธํ•˜๋ฉด์„œ ๋†’์€ ๋ฐ์ดํ„ฐ ํ’ˆ์งˆ์„ ์œ ์ง€ํ•˜๋Š” ๋ฐฉ๋ฒ•์— ๋Œ€ํ•ด ์•Œ์•„๋ณด์„ธ์š”. ๋˜ํ•œ ๋ชจ๋ธ ํŽธํ–ฅ์„ฑ, ๋…์„ฑ, ํ’ˆ์งˆ ๊ด€๋ฆฌ์˜ ๋ณต์žก์„ฑ๊ณผ ๋ชจ๋ธ ์„ค๋ช… ๊ฐ€๋Šฅ์„ฑ ๋ฐ ํˆฌ๋ช…์„ฑ ๋ฌธ์ œ์— ๋Œ€ํ•ด์„œ๋„ ๋…ผ์˜ํ•  ์˜ˆ์ •์ž…๋‹ˆ๋‹ค.

4๏ธโƒฃ Latency: ์ง€์—ฐ ์‹œ๊ฐ„์ด ๋ฌธ์ œ๊ฐ€ ๋˜๋Š” ์ด์œ ๋Š” ๋ฌด์—‡์ด๋ฉฐ ์–ด๋–ป๊ฒŒ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ์„๊นŒ์š”? Transformer ์•„ํ‚คํ…์ฒ˜์˜ ํ•œ๊ณ„์™€ the potential of models like sequential state space models, Flash Attention Model์˜ ์ž ์žฌ๋ ฅ์— ๋Œ€ํ•ด ์•Œ์•„๋ณด์„ธ์š”.

5๏ธโƒฃ ๋ชจ๋ธ ํ•™์Šต ๋ฐฉ๋ฒ• ๋ฐ ์—”์ง€๋‹ˆ์–ด๋ง: GPT-4์—์„œ ์ „๋ฌธ๊ฐ€ ํ˜ผํ•ฉ ๋ชจ๋ธ(MoE)์„ ์„ฑ๊ณต์ ์œผ๋กœ ๊ตฌํ˜„ํ•œ ์‚ฌ๋ก€, ๋‹ค์ค‘ ์ฟผ๋ฆฌ ์ฃผ์˜(MQA)์˜ ์ž ์žฌ๋ ฅ, ๊ทธ๋ฆฌ๊ณ  ๋ชจ๋ธ์˜ ๋ฏธ๋ž˜์— ๋Œ€ํ•œ ์˜ˆ์ธก์— ๋Œ€ํ•ด ์•Œ์•„๋ณด์„ธ์š”.

ํ•ด๋‹น ์„ธ์…˜ ๊ด€๋ จ ์ฝ์„๊ฑฐ๋ฆฌ

1. [Unleashing LLMs in Production: Challenges & Opportunities. Chip Huyen, Amjad Masad & Michele Catasta](https://youtu.be/ByhMpN2iSbc)
2. [Real-time machine learning: challenges and solutions](https://huyenchip.com/.../real-time-machine-learning...)
3. [Building LLM applications for production](https://huyenchip.com/2023/04/11/llm-engineering.html)
4. [Efficiently Scaling and Deploying LLMs // Hanlin Tang // LLM's in Production Conference](https://youtu.be/AVccFl8-5-8)
5. [Cost Optimization and Performance // LLMs in Production Conference Panel Discussion 2](https://youtu.be/wxq1ZeAM9fc)
6. [Solving the Last Mile Problem of Foundation Models with Data-Centric AI](https://youtu.be/-oDgV6q6KtI...)
1. [Everyone will soon be using foundation models (FMs) like GPT-4.](https://threadreaderapp.com/thread/1642666624091312129.html)
7. [Debugging LLMs: Best Practices for Better Prompts and Data Quality](https://youtu.be/OsP1PAKyHq0)
Back in 1997, Netflix was just 6 people in a 1000 sqft office in Santa Cruz. Today, the business is worth $200b+, has 10k+ employees globally, and is the world's largest streaming platform with 230m+ subscribers. Foundation Capital was lucky to be the first investor in the company and witness the journey from inception through IPO.

I recently sat down with my friend Jim Cook, one of the co-founders of Netflix, to hear some stories about the company in the early days. Below are a few lessons that will hopefully be valuable to those in the tech and startup ecosystem:

- Obsess over your customers - Netflix truly obsessed over making their early customers happy. They would never ask the question "would you pay for this?" (because the answer is often a lazy yes) but rather "what would make you rave about this to your friends?" Despite having many opportunities to sell ads on the red envelopes they shipped DVDs in, they always refused, citing that ads would only ruin the experience. The company always focused on creating experiences that made people rave, leading to insane organic growth in the early years.

- Do things that don't scale - in the early days of Netflix, Jim would spend hours stuffing envelopes with DVDs and hauling packages to and from the post office. The early "machine learning" recommendations for the website were literally crowdsourced from small focus groups on Usenet forums.

- Word-of-mouth is the best GTM strategy - Netflix did not spend a dime on advertising until 2005, 8 years after its founding! They focused entirely on word-of-mouth to acquire customers and ensured they had very strong product-market fit before scaling paid acquisition.

- Compensate innovatively and generously - Today, Netflix is well known for paying top talent well above market rates. In the early days, they were the first major company to offer "flexible compensation" allowing new hires to chose their ideal mix of base, bonus and equity. Flexible and generous compensation packages have allowed Netflix to hire and retain the very best in the industry.

- Have multiple "why now" moments - Netflix made 2 (then) non-obvious and big bets. The first was on DVDs usurping VHS as the primary video storage format. The second was on eCommerce taking over brick-and-mortar as the best channel for acquiring customers in the video rental segment. The company rode multiple tailwinds, ensuring that even if one didn't pan out, the business would be able to succeed.

- Carrot not stick - the core philosophy of the company was to never piss off users, no matter what. This meant having systems in place that incented users and never penalized them. For example, early users were never fined for returning a DVD late; instead they simply couldn't get their next DVD until they returned their old one.

https://www.linkedin.com/feed/update/urn:li:activity:7086870579282104320/