Offshore
Photo
God of Prompt
RT @godofprompt: 🚨 DeepMind discovered that neural networks can train for thousands of epochs without learning anything.
Then suddenly, in a single epoch, they generalize perfectly.
This phenomenon is called "Grokking".
It went from a weird training glitch to a core theory of how models actually learn.
Here’s what changed (and why this matters now):
tweet
RT @godofprompt: 🚨 DeepMind discovered that neural networks can train for thousands of epochs without learning anything.
Then suddenly, in a single epoch, they generalize perfectly.
This phenomenon is called "Grokking".
It went from a weird training glitch to a core theory of how models actually learn.
Here’s what changed (and why this matters now):
tweet
memenodes
Porn addiction is so crazy like how you addicted to other nig*as getting pussy?
tweet
Porn addiction is so crazy like how you addicted to other nig*as getting pussy?
STOP WATCHING PORN!!
STOP WATCHING PORN!!!
STOP WATCHING PORN!!
YES YOU!!👀👀..STOP IT!!
STOP WATCHING PORN!! - mtweet
X (formerly Twitter)
m (@skitzocat) on X
STOP WATCHING PORN!!
STOP WATCHING PORN!!!
STOP WATCHING PORN!!
YES YOU!!👀👀..STOP IT!!
STOP WATCHING PORN!!
STOP WATCHING PORN!!!
STOP WATCHING PORN!!
YES YOU!!👀👀..STOP IT!!
STOP WATCHING PORN!!
Offshore
Video
Offshore
Photo
Brady Long
If you’re building agents, this matters more than another bigger model launch.
MiroThinker 1.5 is about agentic density: better reasoning per parameter, lower cost, and more controllable behavior.
Explore: https://t.co/mlOCZjARcI
https://t.co/q3P6xXXYhW
tweet
If you’re building agents, this matters more than another bigger model launch.
MiroThinker 1.5 is about agentic density: better reasoning per parameter, lower cost, and more controllable behavior.
Explore: https://t.co/mlOCZjARcI
https://t.co/q3P6xXXYhW
We just flipped the scaling narrative: Agentic Density > Parameter Count.
#MiroThinker 1.5 operationalizes Interactive Scaling—agents that seek evidence, iterate, and revise in real time (with a time-sensitive sandbox to avoid hindsight leakage).
Result: 30B hitting frontier-class agentic search, ~$0.07/query (≈20× cheaper vs 1T-class baselines).
Fully open source, read more: https://t.co/m4HzxiidRX
Try: https://t.co/dwKmu3O9t7
GH:https://t.co/u4VhL8o8Gt
HF: https://t.co/ClqKRrQn6R - MiroMindAItweet
Offshore
Photo
memenodes
Me going to another city because I'm too shy to ask the driver to stop https://t.co/ArDJQRHlU1
tweet
Me going to another city because I'm too shy to ask the driver to stop https://t.co/ArDJQRHlU1
tweet
Offshore
Photo
memenodes
Me when I say “Internet capital markets” instead of meme coins https://t.co/BFN5fYk1Gd
tweet
Me when I say “Internet capital markets” instead of meme coins https://t.co/BFN5fYk1Gd
tweet
Offshore
Photo
God of Prompt
RT @rryssf_: This paper from BMW Group and Korea’s top research institute exposes a blind spot almost every enterprise using LLMs is walking straight into.
We keep talking about “alignment” like it’s a universal safety switch.
It isn’t.
The paper introduces COMPASS, a framework that shows why most AI systems fail not because they’re unsafe, but because they’re misaligned with the organization deploying them.
Here’s the core insight.
LLMs are usually evaluated against generic policies: platform safety rules, abstract ethics guidelines, or benchmark-style refusals.
But real companies don’t run on generic rules.
They run on internal policies:
- compliance manuals
- operational playbooks
- escalation procedures
- legal edge cases
- brand-specific constraints
And these rules are messy, overlapping, conditional, and full of exceptions.
COMPASS is built to test whether a model can actually operate inside that mess.
Not whether it knows policy language, but whether it can apply the right policy, in the right context, for the right reason.
The framework evaluates models on four things that typical benchmarks ignore:
1. policy selection: When multiple internal policies exist, can the model identify which one applies to this situation?
2. policy interpretation: Can it reason through conditionals, exceptions, and vague clauses instead of defaulting to overly safe or overly permissive behavior?
3. conflict resolution: When two rules collide, does the model resolve the conflict the way the organization intends, not the way a generic safety heuristic would?
4. justification: Can the model explain its decision by grounding it in the policy text, rather than producing a confident but untraceable answer?
One of the most important findings is subtle and uncomfortable:
Most failures were not knowledge failures.
They were reasoning failures.
Models often had access to the correct policy but:
- applied the wrong section
- ignored conditional constraints
- overgeneralized prohibitions
- or defaulted to conservative answers that violated operational goals
From the outside, these responses look “safe.”
From the inside, they’re wrong.
This explains why LLMs pass public benchmarks yet break in real deployments.
They’re aligned to nobody in particular.
The paper’s deeper implication is strategic.
There is no such thing as “aligned once, aligned everywhere.”
A model aligned for an automaker, a bank, a hospital, and a government agency is not one model with different prompts.
It’s four different alignment problems.
COMPASS doesn’t try to fix alignment.
It does something more important for enterprises:
it makes misalignment measurable.
And once misalignment is measurable, it becomes an engineering problem instead of a philosophical one.
That’s the shift this paper quietly pushes.
Alignment isn’t about being safe in the abstract.
It’s about being correct inside a specific organization’s rules.
And until we evaluate that directly, most “production-ready” AI systems are just well-dressed liabilities.
tweet
RT @rryssf_: This paper from BMW Group and Korea’s top research institute exposes a blind spot almost every enterprise using LLMs is walking straight into.
We keep talking about “alignment” like it’s a universal safety switch.
It isn’t.
The paper introduces COMPASS, a framework that shows why most AI systems fail not because they’re unsafe, but because they’re misaligned with the organization deploying them.
Here’s the core insight.
LLMs are usually evaluated against generic policies: platform safety rules, abstract ethics guidelines, or benchmark-style refusals.
But real companies don’t run on generic rules.
They run on internal policies:
- compliance manuals
- operational playbooks
- escalation procedures
- legal edge cases
- brand-specific constraints
And these rules are messy, overlapping, conditional, and full of exceptions.
COMPASS is built to test whether a model can actually operate inside that mess.
Not whether it knows policy language, but whether it can apply the right policy, in the right context, for the right reason.
The framework evaluates models on four things that typical benchmarks ignore:
1. policy selection: When multiple internal policies exist, can the model identify which one applies to this situation?
2. policy interpretation: Can it reason through conditionals, exceptions, and vague clauses instead of defaulting to overly safe or overly permissive behavior?
3. conflict resolution: When two rules collide, does the model resolve the conflict the way the organization intends, not the way a generic safety heuristic would?
4. justification: Can the model explain its decision by grounding it in the policy text, rather than producing a confident but untraceable answer?
One of the most important findings is subtle and uncomfortable:
Most failures were not knowledge failures.
They were reasoning failures.
Models often had access to the correct policy but:
- applied the wrong section
- ignored conditional constraints
- overgeneralized prohibitions
- or defaulted to conservative answers that violated operational goals
From the outside, these responses look “safe.”
From the inside, they’re wrong.
This explains why LLMs pass public benchmarks yet break in real deployments.
They’re aligned to nobody in particular.
The paper’s deeper implication is strategic.
There is no such thing as “aligned once, aligned everywhere.”
A model aligned for an automaker, a bank, a hospital, and a government agency is not one model with different prompts.
It’s four different alignment problems.
COMPASS doesn’t try to fix alignment.
It does something more important for enterprises:
it makes misalignment measurable.
And once misalignment is measurable, it becomes an engineering problem instead of a philosophical one.
That’s the shift this paper quietly pushes.
Alignment isn’t about being safe in the abstract.
It’s about being correct inside a specific organization’s rules.
And until we evaluate that directly, most “production-ready” AI systems are just well-dressed liabilities.
tweet
Offshore
Photo
God of Prompt
RT @godofprompt: 🚨 DeepMind discovered that neural networks can train for thousands of epochs without learning anything.
Then suddenly, in a single epoch, they generalize perfectly.
This phenomenon is called "Grokking".
It went from a weird training glitch to a core theory of how models actually learn.
Here’s what changed (and why this matters now):
tweet
RT @godofprompt: 🚨 DeepMind discovered that neural networks can train for thousands of epochs without learning anything.
Then suddenly, in a single epoch, they generalize perfectly.
This phenomenon is called "Grokking".
It went from a weird training glitch to a core theory of how models actually learn.
Here’s what changed (and why this matters now):
tweet