Advanced English Skills
34.8K subscribers
272 photos
110 videos
22 files
13.4K links
Download Telegram
Word of the Day
haggle

Definition: (verb) To bargain, as over the price of something; dicker.
Synonyms: chaffer, higgle, huckster.
Usage: He preferred to be overcharged than to haggle over such a trivial item.
Discuss

@EngSkills
Idiom of the Day
junkyard dog

An especially nasty, vicious, or savage person or animal (especially a dog). Of a person, often used in the phrase "meaner than a junkyard dog." Watch the video

@EngSkills
Phrasal Verb of the Day | Vocabulary | EnglishClub
empty out

to remove everything from inside something
Sent by @TheFeedReaderBot

@EngSkills
Learn English Through Football
Euro 2024 Football Phrase day 23: Come from Behind

Euro 24 Football Language Phrase Day 23: Come from Behind Our day 23 football phrase from Euro 2024 is to come from behind: a phrase that can be used to describe both games played on the day. Don’t forget we have hundreds more explanations of football language in our football glossary and we also have […]

The post Euro 2024 Football Phrase day 23: Come from Behind appeared first on Learn English Through Football.

@EngSkills
Language Log
Are LLMs writing PubMed articles?

Kyle Orland, "The telltale words that could identify generative AI text", ars technica 7/1/2024

In a pre-print paper posted earlier this month, four researchers from Germany's University of Tubingen and Northwestern University said they were inspired by studies that measured the impact of the COVID-19 pandemic by looking at excess deaths compared to the recent past. By taking a similar look at "excess word usage" after LLM writing tools became widely available in late 2022, the researchers found that "the appearance of LLMs led to an abrupt increase in the frequency of certain style words" that was "unprecedented in both quality and quantity."

To measure these vocabulary changes, the researchers analyzed 14 million paper abstracts published on PubMed between 2010 and 2024, tracking the relative frequency of each word as it appeared across each year. They then compared the expected frequency of those words (based on the pre-2023 trendline) to the actual frequency of those words in abstracts from 2023 and 2024, when LLMs were in widespread use.

The results found a number of words that were extremely uncommon in these scientific abstracts before 2023 that suddenly surged in popularity after LLMs were introduced. The word "delves," for instance, shows up in 25 times as many 2024 papers as the pre-LLM trend would expect; words like "showcasing" and "underscores" increased in usage by nine times as well. Other previously common words became notably more common in post-LLM abstracts: the frequency of "potential" increased 4.1 percentage points; "findings" by 2.7 percentage points; and "crucial" by 2.6 percentage points, for instance.
The cited paper is Dmitry Kobak et al., "Delving into ChatGPT usage in academic writing through excess vocabulary", arXiv.org 7/3/2024:

Recent large language models (LLMs) can generate and revise text with human-level performance, and have been widely commercialized in systems like ChatGPT. These models come with clear limitations: they can produce inaccurate information, reinforce existing biases, and be easily misused. Yet, many scientists have been using them to assist their scholarly writing. How wide-spread is LLM usage in the academic literature currently? To answer this question, we use an unbiased, large-scale approach, free from any assumptions on academic LLM usage. We study vocabulary changes in 14 million PubMed abstracts from 2010-2024, and show how the appearance of LLMs led to an abrupt increase in the frequency of certain style words. Our analysis based on excess words usage suggests that at least 10% of 2024 abstracts were processed with LLMs. This lower bound differed across disciplines, countries, and journals, and was as high as 30% for some PubMed sub-corpora. We show that the appearance of LLM-based writing assistants has had an unprecedented impact in the scientific literature, surpassing the effect of major world events such as the Covid pandemic.

This claim may very well be right — I haven't evaluated their statistical model — but there are some things about the argument that leave me skeptical.

The first thing to note is that some of the cited increases are much more "abrupt" than others, with forms of the verb delve leading the list, as shown in their Figure 1: http://languagelog.ldc.upenn.edu/myl/AllegedLLMwords.png And their Figure 2(a), captioned "Frequencies in 2024 and frequency ratios (r). Both axes are on log-scale": http://languagelog.ldc.upenn.edu/myl/AllegedLLMwords2.png (I believe that the paper's authors downloaded the PubMed data and did their own searches and counts, but if you do your own exploration on the PubMed site, keep in mind that PubMed apparently searches by lemma, so that a search for "delves" also hits on "delve", "delving", "delved" — which I'll signal in the usual way with square brackets,[...]
Advanced English Skills
Language Log Are LLMs writing PubMed articles? Kyle Orland, "The telltale words that could identify generative AI text", ars technica 7/1/2024 In a pre-print paper posted earlier this month, four researchers from Germany's University of Tubingen and Northwestern…
e.g. [delve]. And PubMed returns the number of citations (abstracts or available texts) per year containing the lemmas in question — for the relative frequency results, normalizing for the number of available citations per year, you can use Ed Sperr's github page. I haven't tried to separate the alternative forms, but that should not matter for the points I'm making below.)

My first observation is that decades-long trends in relative PubMed word usage are common, and not just because of real-world references like ebola, covid, and chatgpt. For example, [explore] has been gaining on [investigate] for 25 years or so, with acceleration over the past decade: http://languagelog.ldc.upenn.edu/myl/PubMed1.png http://languagelog.ldc.upenn.edu/myl/PubMed2.png It's also worth noting that trends of similar size (and often similar direction) can be found in more general sources than PubMed, e.g. Corpus of Historical American English: http://languagelog.ldc.upenn.edu/myl/COHAdelve2a.png http://languagelog.ldc.upenn.edu/myl/COHAdelve3.png And the next thing to notice is that the changes in [delve], though extremely large in proportional terms, are small in terms of actual citation frequency, e.g. 5,526 in 2024 for [delve] compared to 108,616 for [explore]: http://languagelog.ldc.upenn.edu/myl/PubMed3.png The proportional change for [delve] from 2022 to 2024 is indeed impressive (numbers below are from Ed Sperr's github page — the 2024 data is only for part of the year, obviously…)

* citations 629 to 5,526, factor of 8.8; citations per 100k 35.37  to 591.33, factor of 16.7

But if every PubMed citation containing a form of [delve] were written (or edited) by ChatGPT, those 591.33 citations per 100k would amount to just 0.59% of the year's citations. That's lot less than the "at least 10%" claimed by the article, which would throw us back into the evaluation of the overall statistical model.

And the proportional changes from 2022 to 2024 for their other chosen words are substantially smaller, e.g.

* [showcase]: citations 1,900 to 4,470, factor of 2.4; per 100k 106.85 to 478.74, factor of 4.48
* [surpass]:  citations 1,984 to 4,348,  factor of 2.2; per 100k 111.57 to 465.67, factor of 4.17
* [emphasize]: citations 17,945 to 22,151, for a factor of 1.2; 1009.13 to 2372.36, factor of 2.35

And [delve] has been gaining in relative frequency on (for example) [explore] since 2009, long before ChatGPT was available to researchers, even though [explore] has been gaining in popularity during that time: http://languagelog.ldc.upenn.edu/myl/PubMed4.png In the broader COHA collection, [delve] has been increasing in popularity since the 1940s: http://languagelog.ldc.upenn.edu/myl/COHAdelve1.png None of this explains [delve]'s proportional change factor of 16.7 from 2022 to 2024 in citations per 100k, but it does show that there are cultural trends (even fads) in word usage. And it's also not clear why ChatGPT should promote [delve] and not e.g. [probe] or [seek] or [sift] — though maybe it's because of who they hired for RLHF?

[See also "Bing gets weird — and (maybe) why" (2/16/2023); "Annals of AI bias" (9/23/2023) ]

@EngSkills
Slang of the Day | Vocabulary | EnglishClub
XYZ

"code" said to alert someone that their zipper, or fly, is open

@EngSkills
Word of the Day
extralegal

Definition: (adjective) Not permitted or governed by law.
Synonyms: nonlegal.
Usage: The vigilantes believed they were simply dispensing an extralegal form of frontier justice.
Discuss

@EngSkills
Merriam-Webster's Word of the Day
dicker

Merriam-Webster's Word of the Day for July 8, 2024 is:

dicker • \DIK-er\  • verb

To dicker is to talk or argue with someone about the conditions of a purchase, agreement, or contract.

// My favorite thing about flea markets is dickering over prices.

See the entry >
Examples:

“They haggled and dickered and bargained through a good number of dealerships.” — Terry Woster, Tri-State Neighbor (Sioux Falls, South Dakota), 7 Dec. 2023
Did you know?

The origins of the verb dicker likely lie in an older dicker, the noun referring to a quantity of ten animal hides or skins. The idea is that the verb arose from the bartering of, and haggling over, animal hides on the American frontier. The noun dicker comes from decuria, the Latin word for a bundle of ten hides, and ultimately from the Latin word decem, meaning "ten." The word entered Middle English as dyker and by the 14th century had evolved to dicker.

@EngSkills
Phrasal Verb of the Day | Vocabulary | EnglishClub
sit up (2)

to not go to bed until later than usual

@EngSkills
Language Log
Meme collision of the week

Lauren Jack ("Do you hurkle-durkle? What the Scottish word taking over social media means and where it came from", The Scotsman 1/24/2024) embeds a TikTok video from 7/18/2023:

@devriebrynnme my Scottish ancestors = just chillin’ as a culture♬ original sound – Devriebrynn
Ben Zimmer quickly picked it up lexicographically — "To ‘Hurkle-Durkle,’ or Lounge in Bed, Is a TikTok Trend That’s 200 Years Old: A 19th-century Scottish rhyming phrase has resurfaced and gone viral", WSJ 3/1/2024. And a Google News search for the term turns up dozens of recent articles.

So I should have been ready for 6,272 words from Lance Eliot at forbes.com on how the "Trend Of ‘Hurkle-Durkling’ In Bed Gets Boosted To High Form Via Modern Generative AI" (7/7/2-24). The article starts with a lot of standard thoughts about hurkle-durkling, morality, electronic media, mental health, and so on. But it does bring in generative AI, starting with this framing question:

"Modern-day generative AI and large language models (LLMs) are readily used while lounging around in bed. Does this then change the proposition underlying the considered negative perceptions of doing a hurkle-durkle? Should we reconsider the nature of hurkling-durkling?"

It continues with "a quickie backgrounder" on generative AI, for those who've been meditating in a cave for the past few years, and a list of "significant approaches that intertwine generative AI and hurkle-durkling". And it ends with "a series of dialogues with ChatGPT" about the topic, based on the prompts

* “What is hurkle-durkling?”
* “Is hurkle-durkle good or bad?”
* “Give an example of a person lying in bed that opts to do a hurkle-durkle and what is possibly going on in their mind as they do so.”
* “If a person was doing a hurkle-durkle, how might generative AI be of use to them during that time?”

I'm tempted to write a program that uses the same template to generate a long article about an arbitrary topic, but I have a feeling that Mr. Eliot has been there already.

@EngSkills
Language Log
A Romano-Sarmatian soldier in circa 2nd c. AD Britain

We have occasionally mentioned Sarmatians on Language Log, but usually in association with the Scythians, of whom we have often spoken (most recently here, with extensive bibliography).

These two peoples of ancient times both spoke languages in the Iranian language family and lived in the area north of the Black Sea. The languages and cultures of the Scythians and Sarmatians were related but distinct. In particular their styles of warfare were different. The Scythians were noted as mounted archers. They may have been the inventors or one of the inventors of the stirrup. The stirrup enabled mounted archers to fire (shoot) arrows reasonably accurately while riding. The Scythians attacked in a mass firing of arrows. If their adversaries were not overwhelmed by the hail of arrows then the Scythians turned and rode to a safe distance for regrouping to mount another mass attack.
Most adversaries were overwhelmed by the Scythian battle tactics. It was only the Sarmatians who found a successful counter-strategy to withstand the Scythians. The Sarmatian warriors and their mounts were protected with armor. Usually the armor consisted of metal plates of bronze or iron sewn onto leather garments. This armor enabled the Sarmatians to withstand a Scythian attack. After a Scythian onslaught the Sarmatians would attack the Scythians with fifteen-foot-long lances. The Sarmatians were probably the originator of the armored knights of medieval Europe.

(source)

Before focusing on the single ca. 2nd c. AD Sarmatian who is the main subject of this post, we would do well to learn more about the Sarmatians themselves.

The Sarmatians (/sɑːrˈmeɪʃiənz/; Ancient Greek: Σαρμάται, romanized: Sarmatai; Latin: Sarmatae [ˈsarmatae̯]) were a large confederation of ancient Iranian equestrian nomadic peoples who dominated the Pontic steppe from about the 3rd century BC to the 4th century AD.

The earliest reference to the Sarmatians is in the Avesta, Sairima-, which is in the later Iranian sources recorded as *Sarm and Salm. Originating in the central parts of the Eurasian Steppe, the Sarmatians were part of the wider Scythian cultures.[3] They started migrating westward around the fourth and third centuries BC, coming to dominate the closely related Scythians by 200 BC. At their greatest reported extent, around 100 BC, these tribes ranged from the Vistula River to the mouth of the Danube and eastward to the Volga, bordering the shores of the Black and Caspian seas as well as the Caucasus to the south.

(Wikipedia)

Now we have a detailed scientific report about one of those Sarmatian soldiers who made the roughly 1,500 mile trek to Romano-Britain during the early part of the first millennium AD. Ancient Skeleton From Southern Russia Surprises UK Scientists, by Sam Anderson, ExplorersWeb (December 27, 2023)

Offord Cluny 203645 was a complete, well-preserved male skeleton, buried without any personal effects in a Cambridgeshire ditch. A team led by the Francis Crick Institute could tell the remains were clearly ancient. But with no contextual clues to go on, they might have hit a dead end.

Updated forensic technology intervened, and provided the first biological proof of a certain, far-flung immigration pattern during the Roman Empire.

The man was a Sarmatian, and the team’s tests proved he made it from his homeland in what is now the southern Russia/Ukraine area to his final destination in the United Kingdom.

The article explains how the archeologists found where the man came from:

First, they extracted DNA from a tiny bone in his inner ear. This turned out to be his best-preserved body part containing the most complete DNA samples. Dr. Marina Silva, of the Ancient Genomics Laboratory at the Francis Crick[...]
Advanced English Skills
Language Log A Romano-Sarmatian soldier in circa 2nd c. AD Britain We have occasionally mentioned Sarmatians on Language Log, but usually in association with the Scythians, of whom we have often spoken (most recently here, with extensive bibliography). These…
Institute, extracted and analyzed the samples for the study.

“This is not like testing the DNA of someone alive,” Silva told The BBC. “The DNA is very fragmented and damaged. However, we were able to [decode] enough of it. The first thing we saw was that genetically he was very different” from the Romano-British individuals they’d previously studied.



That still didn’t connect the dots, though. How could the scientists prove that he was born in Eurasia and immigrated to the place of his death?

For this, they examined his teeth. Even two millennia after his death, the tissue harbored chemicals in varying amounts at different layers. Offord Cluny underwent pronounced dietary changes at ages 5 and 9 and began to level out around 13.



The changes, the team found, followed chemical trends you could expect from a person adapting to available food sources while traveling west across Europe.

Millets and sorghum grains, scientifically called C4 crops, are plentiful in the region where Sarmatians lived. These dissipated in his diet as he matured. Wheat — more common in Western Europe — replaced them.

“The [analysis] tells us that he, and not his ancestors, made the journey to Britain. As he grew up, he migrated west, and these plants disappeared from his diet,” said Janet Montgomery of Durham University.

These results are extremely interesting and important because they show that Offord Cluny made this long trip from the Pontic-Caspian steppes to Britain, not just in one lifetime, but within the period of a few years.

The Iranian-speaking peoples who were present in Britain during the Roman period had a profound impact on many aspects of culture, e.g., the Arthurian story cycles and their associated images.  Some of these men participated in the defense of Hadrian's Wall (begun in AD 122).

"The Sarmatians in Europe: Gravestone of a Sarmatian Horseman"

The term "Sarmatians" is believed to refer to various horse-riding peoples from the territory of present-day Iran. From the 3rd century BC, they settled in present-day southern Russia and Ukraine, where they displaced the Scythians. From the 3rd century onward, Sarmatian tribes also settled in the Roman Empire, often adopted Roman citizenship and served in Roman legions, having been hired as auxiliary troops. In Britain, for example, the Sarmatians defended Hadrian's Wall against the attacks of the Scottish Picts. The photograph shows the gravestone of a Sarmatian horseman from the Roman settlement of Deva Victrix (in present-day Chester in northern England).
https://www.ieg-ego.eu/illustrationen/der-noerdliche-schwarzmeerraum/die-sarmaten-in-europa-grabstein-eines-sarmatischen-reiters/@@images/cc363e14-292a-4ba9-a5be-a45ab712361d.jpeg
Gravestone of a Sarmatian horseman who fought for the Romans in Britain, Grosvenor Museum, Chester, England, colour photograph, 2011, photographer: Wolfgang Sauber; image source: Wikimedia Commons, Creative Commons Some Rights Reserved Creative Commons Attribution-Share Alike 3.0 Unported.

For a masterful treatment of the impact of Romano-Iranian forces on English tradition, see:

C. Scott Littleton and Linda A. Malcor, From Scythia to Camelot: A Radical Reassessment of the Legends of King Arthur, the Knights of the Round Table, and the Holy Grail (New York and London: Garland, 1994; rev. pb. 2000). In the British journal, Religion, 28.3 (July, 1998), 294-300, I [VHM] wrote a review in which I pointed out that the celebrated motif of a mighty arm rising up out of the water holding aloft the hero's sword can also be found in a medieval Chinese tale from Dunhuang. That review is available electronically from ScienceDirect, if your library subscribes to it. Otherwise, I think this version on the Web is a fairly faithful copy. Selected readings

* "A medieval Dunhuang man" (7/17/23)
* "The Ossetes" (7/25/21)
* "Ashkenazi and Scythians" (7/13/21)
* "Research reveals man born thousands of miles to the east traveled to Cambridgeshire 2,000 years ag[...]
Slang of the Day | Vocabulary | EnglishClub
mo

moment

@EngSkills
Word of the Day
lidless

Definition: (adjective) Watchful; vigilant.
Synonyms: sleepless.
Usage: He was vigilant—a lidless watcher of the public weal—and took great care to make sure that all was well with his neighbors.
Discuss

@EngSkills