plant lover, cookie monster, shoe fiend
19644 stories
·
21 followers

Aircraft

1 Share

More Range ALIA has flown more piloted miles than any other eVTOL through flight tests and partner engagements.

Across The Country ALIA has flown through, landed in, and said hello to more than half of the United States and parts of Canada. Checkout the Barnstorm here

Pilot-Tested A squadron of pilots have flown ALIA, including those representing our partners, investors, and fellow aviation enthusiasts.

Read the whole story
sarcozona
5 hours ago
reply
Epiphyte City
Share this story
Delete

Covid update in England - new variants are dominant but are NOT causing a wave

1 Comment

Towards the end of May, I wrote a substack about two emerging variants NB.1.8.1 and XFG. NB.1.8.1 was causing waves in Singaport, Hong Kong, Taiwan and Australia, while XFG was also showing strong growth in some regions. At the time, Covid hospitalisations in England and wastewater monitoring in Scotland had shown stable and low levels for six months (since November 2024) - our longest stable and low period since the start of the pandemic.

I thought that it was reasonably likely that one or both of these variants would cause a summer wave here, although I said I wasn’t particularly worried about either in terms of their potential to cause higher levels of severe illness. Two months later, and these variants are now the most common variants across the world (in different ratios in different countries), including the UK. They likely reached dominance in England around mid June (e.g. see the last variant chart from UKHSA below - note that sample numbers are tiny towards the end, but the picture is consistent with global patterns).

However, whether because it’s been a hot summer plus enough existing immunity in the population, or something else, they just haven’t caused a big wave of infections here. Wastewater monitoring in Scotland shows a small bump in the last month, but that is declining already - compared to the whopper of a wave we had a year ago, this really is not very significant. Remember that wastewater testing does not depend on test seeking behaviour or symptoms, so is the best guide we have to scale and trend of infection in the community.

Even better, the impact of these variants on recorded hospitalisations in England has been negligible, and numbers have fallen in the most recent week of data. (NOTE: monitoring of hospital admissions for RSV and Flu has ended for the season).

This is good news, helped along by the fact that these variants arrived not too long after the Spring vaccine booster campaign which provided extra protection to many elderly or otherwise clinically vulnerable people (although not as many as we’d like to see boosted every year).

We should be glad about this - as far as I’m concerned, the longer we go without a large Covid wave, the better. It doesn’t mean that no one is getting Covid - it’s there in the background and doesn’t almost disappear in the summer like Flu and RSV. It also doesn’t mean that there won’t be another big Covid wave in the future. But at the moment, things are calm on the Covid Front and long may they stay that way.

PS for a far more detailed look at Covid data, please do check out Bob Hawkins’ weekly substack.

Thanks for reading Diving into Data & Decision making! Subscribe for free to receive new posts and support my work.

Read the whole story
sarcozona
22 hours ago
reply
Fingers crossed for no summer wave
Epiphyte City
Share this story
Delete

Carney government’s lack of vision on immigration file worries experts

1 Share
Read: 5 min

The Carney government has laid out ambitious plans on housing affordability, defence spending and international trade. But on one of Canada’s most consequential policy files — immigration — experts say Ottawa has yet to offer a vision of its own.

“I don’t think that the Carney administration has really done much thinking yet about immigration,” Daniel Hiebert, professor emeritus in the University of British Columbia’s department of geography, told Canadian Affairs in an interview in May.

“I think they’ve got a whole bunch of high-priority issues, and I think immigration is not near the top of that list.”

Some experts say the absence of a clear vision on immigration risks overlooking Canada’s long-term economic and social needs. 

Immigration is “the policy lever that more than any other determines what the future of Canada’s going to look like,” Hiebert said. 

In 2016, an influential federal advisory council recommended Canada increase its annual immigration intake from 300,000 to 450,000 to offset its aging demographic and sustain economic growth. 

The Trudeau government embraced that advice, gradually boosting permanent residency admissions over several years to a high of nearly 485,000 in 2024. 

But amidst a public backlash over housing affordability and service capacity constraints, the government reversed course in late 2024. 

The Carney government has largely continued these targets. It has stabilized permanent residency admissions at below one per cent of the total population, effectively halting further population growth. And it has pledged to reduce Canada’s reliance on temporary residents, introducing a five per cent cap on the size of Canada’s non-permanent population. 

“They’re just simply following the policy legacy of the Trudeau government,” Hiebert said. 

The demographic trends that informed the advisory council’s recommendations in 2016 remain true today. In fact, Canada’s fertility rate fell to a record low of 1.3 in 2022, far below the 2.1 replacement level. 

Lisa Lalande, CEO of the Century Initiative, a non-profit that advocates for population growth, says Canada’s current immigration levels fail to address the demographic and economic challenges Canada faces.

“We have an aging population and a crashing fertility rate — and a dependency ratio that ultimately means we are going to have difficulty in the future with regards to meeting our labour needs,” she said. 

A 2024 Parliamentary Budget Officer report highlights the challenges. It projects the proportion of individuals aged 65 and older relative to the working-age population will increase from nearly 30 per cent in 2023 to nearly 50 per cent by 2098. And this projection was based on the government’s higher, 2023 immigration targets.

The Century Initiative has proposed Canada raise permanent residency admissions to between 1.15 and 1.25 per cent of the national population each year — a higher target than the Carney government’s current cap, which sits just below one per cent. At current population levels, this would mean admitting between 478,000 and 519,000 permanent residents a year.

“Canada needs a stronger plan that looks at the issues more holistically and understands immigration as central to a strategic plan for growth,” said Lalande. 

Splitting the pie

Other voices are more circumspect about the need for sustained, high immigration levels.

In December 2024, former Bank of Canada governor Stephen Poloz said in a webinar that Canada is in a recession. He said the weakness of Canada’s economy has been masked by high population growth.

“A technical [recession] is a superficial definition that you have two quarters of negative growth in a row, and we haven’t had that,” Poloz said. “But the reason is because we’ve been swamped with new immigrants who buy the basics in life, and that boosts our consumption enough.” 

A January report by the PBO projected that Ottawa’s reduced 2024 immigration targets would lower real GDP by 1.7 per cent in 2027. But it also projected that real GDP per capita would be 1.4 per cent higher in the same year. 

In effect, high population growth can decrease the amount of economic activity per person.

“In a sense, it’s the size of the pie,” explained Lars Osberg, an economics professor at Dalhousie University. “If you have to divide the pie among more people, each slice of the pie is on average going to be smaller.” 

Real GDP per capita — which has declined in five of the past six quarters in Canada — is well below its long-term trend, which equates to a decline in living standards of about $4,200 a person, a recent Statistics Canada report says.

Osberg particularly worries about the effect of low-skilled immigration on workers’ average incomes.

“If you bring in a whole bunch of people to work at Canadian Tire or Tim Hortons or all these relatively low paid service sector jobs, you increase GDP, but you increase it by less than the average of all incomes. And so that pulls the average down,” he said.

Social friction

Eric Kaufmann, a Canadian professor of politics at the University of Buckingham, says high immigration levels also have far-reaching consequences for national identity and social trust. 

“The more diverse the population, the less people trust each other,” he said. 

Kaufmann draws on decades of international research showing that rapid demographic change can strain civic engagement and weaken collective identity, especially if policymakers fail to address these shifts openly.

The Carney government has defended its current immigration strategy.

An Immigration, Refugees and Citizenship Canada spokesperson told Canadian Affairs in an emailed statement that the government is managing immigration levels in a “balanced, sustainable” way. 

“Successful immigration requires an alignment between immigration levels and the ability to properly welcome newcomers with housing, accessible healthcare and education,” the spokesperson said. 

The spokesperson also emphasized the government’s efforts to transition skilled temporary residents already in Canada to permanent status. 

“Expected to represent over 40 per cent of overall permanent resident admissions in 2025, these people are skilled, educated and integrated into Canadian society,” the spokesperson said.

“They will continue to support the workforce and economy without placing additional demands on services, as they are already settled in Canada with jobs and housing.”

Kaufmann says many Canadian policymakers and academics avoid a critical discussion of immigration policy, because they can be professionally ostracized for doing so.

But Kaufmann suggests there is a public appetite for more frank discussion about immigration policy. Without it, the gap between public sentiment and elite consensus on immigration policy could grow and lead to social instability, he says.

“ What you’re getting is a buildup of discontent,” said Kaufmann. “But until it’s released, we won’t know how profound it is.”

In Osberg’s view, there is not merely an appetite but a need for such discussions.

“One of the things I kind of regret about the way in which we organize ourselves in Canada is that we don’t seem to be able to have an informed public debate on those issues,” said Osberg. He noted there was no debate about the high immigration levels introduced under the Trudeau government. 

“[Immigration levels] get decided and announced as targets,” he said.

“You can think of a whole bunch of important questions for democracy, but one of the big questions has got to be, how big should we be?”

The post Carney government’s lack of vision on immigration file worries experts appeared first on CANADIAN AFFAIRS.

Read the whole story
sarcozona
22 hours ago
reply
Epiphyte City
Share this story
Delete

Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity - METR

1 Comment

We conduct a randomized controlled trial (RCT) to understand how early-2025 AI tools affect the productivity of experienced open-source developers working on their own repositories. Surprisingly, we find that when developers use AI tools, they take 19% longer than without—AI makes them slower. We view this result as a snapshot of early-2025 AI capabilities in one relevant setting; as these systems continue to rapidly evolve, we plan on continuing to use this methodology to help estimate AI acceleration from AI R&D automation 1.

See the full paper for more detail.

Forecasted vs observed slowdown chart

Motivation

While coding/agentic benchmarks 2 have proven useful for understanding AI capabilities, they typically sacrifice realism for scale and efficiency—the tasks are self-contained, don’t require prior context to understand, and use algorithmic evaluation that doesn’t capture many important capabilities. These properties may lead benchmarks to overestimate AI capabilities. In the other direction, because benchmarks are run without live human interaction, models may fail to complete tasks despite making substantial progress, because of small bottlenecks that a human would fix during real usage. This could cause us to underestimate model capabilities. Broadly, it can be difficult to directly translate benchmark scores to impact in the wild.

One reason we’re interested in evaluating AI’s impact in the wild is to better understand AI’s impact on AI R&D itself, which may pose significant risks. For example, extremely rapid AI progress could lead to breakdowns in oversight or safeguards. Measuring the impact of AI on software developer productivity gives complementary evidence to benchmarks that is informative of AI’s overall impact on AI R&D acceleration.

Methodology

To directly measure the real-world impact of AI tools on software development, we recruited 16 experienced developers from large open-source repositories (averaging 22k+ stars and 1M+ lines of code) that they’ve contributed to for multiple years. Developers provide lists of real issues (246 total) that would be valuable to the repository—bug fixes, features, and refactors that would normally be part of their regular work. Then, we randomly assign each issue to either allow or disallow use of AI while working on the issue. When AI is allowed, developers can use any tools they choose (primarily Cursor Pro with Claude 3.5/3.7 Sonnet—frontier models at the time of the study); when disallowed, they work without generative AI assistance. Developers complete these tasks (which average two hours each) while recording their screens, then self-report the total implementation time they needed. We pay developers $150/hr as compensation for their participation in the study.

Methodology overview

Core Result

When developers are allowed to use AI tools, they take 19% longer to complete issues—a significant slowdown that goes against developer beliefs and expert forecasts. This gap between perception and reality is striking: developers expected AI to speed them up by 24%, and even after experiencing the slowdown, they still believed AI had sped them up by 20%.

Below, we show the raw average developer forecasted times, and the observed implementation times—we can clearly see that developers take substantially longer when they are allowed to use AI tools.

Chart of forecasted times and observed implementation times

Given both the importance of understanding AI capabilities/risks, and the diversity of perspectives on these topics, we feel it’s important to forestall potential misunderstandings or over-generalizations of our results. We list claims that we do not provide evidence for in Table 2.

We do not provide evidence that: Clarification
AI systems do not currently speed up many or most software developers We do not claim that our developers or repositories represent a majority or plurality of software development work
AI systems do not speed up individuals or groups in domains other than software development We only study software development
AI systems in the near future will not speed up developers in our exact setting Progress is difficult to predict, and there has been substantial AI progress over the past five years 3
There are not ways of using existing AI systems more effectively to achieve positive speedup in our exact setting Cursor does not sample many tokens from LLMs, it may not use optimal prompting/scaffolding, and domain/repository-specific training/finetuning/few-shot learning could yield positive speedup

Factor Analysis

We investigate 20 potential factors that might explain the slowdown, finding evidence that 5 likely contribute:

Factor analysis table

We rule out many experimental artifacts—developers used frontier models, complied with their treatment assignment, didn’t differentially drop issues (e.g. dropping hard AI-disallowed issues, reducing the average AI-disallowed difficulty), and submitted similar quality PRs with and without AI. The slowdown persists across different outcome measures, estimator methodologies, and many other subsets/analyses of our data. See the paper for further details and analysis.

Discussion

So how do we reconcile our results with impressive AI benchmark scores, and anecdotal reports of AI helpfulness and widespread adoption of AI tools? Taken together, evidence from these sources gives partially contradictory answers about the capabilities of AI agents to usefully accomplish tasks or accelerate humans. The following tablesummary breaks down these sources of evidence and summarizes the state of our evidence from these sources. Note that this is not intended to be comprehensive—we mean to very roughly gesture at some salient important differences.

  Our RCT Benchmarks like SWE-Bench Verified, RE-Bench Anecdotes and widespread AI adoption
Task type PRs from large, high-quality open-source codebases SWE-Bench Verified: open-source PRs with author-written tests, RE-Bench: manually crafted AI research problems with algorithmic scoring metrics Diverse
Task success definition Human user is satisfied code will pass review - including style, testing, and documentation requirements Algorithmic scoring (e.g. automated test cases) Human user finds code useful (potentially as a throwaway prototype or ~single-use research code)
AI type Chat, Cursor agent mode, autocomplete Typically fully autonomous agents, which may sample millions of tokens, use complicated agent scaffolds, etc. Various models and tools
Observations Models slow down humans on 20min-4hr realistic coding tasks Models often succeed at benchmark tasks that are very difficult for humans Many people (although certainly not all) report finding AI very helpful for substantial software tasks taking them >1hr, across a wide range of applications.

Task type

PRs from large, high-quality open-source codebases

Task success definition

Human user is satisfied code will pass review - including style, testing, and documentation requirements

AI type

Chat, Cursor agent mode, autocomplete

Observations

Models slow down humans on 20min-4hr realistic coding tasks

Task type

SWE-Bench Verified: open-source PRs with author-written tests RE-Bench: manually crafted AI research problems with algorithmic scoring metrics

Task success definition

Algorithmic scoring (e.g. automated test cases)

AI type

Typically fully autonomous agents, which may sample millions of tokens, use complicated agent scaffolds, etc.

Observations

Models often succeed at benchmark tasks that are very difficult for humans

Task success definition

Human user finds code useful

AI type

Chat, Cursor agent mode, autocomplete

Observations

Many people (although certainly not all) report finding AI very helpful for substantial software tasks taking them >1hr, across a wide range of applications.

Reconciling these different sources of evidence is difficult but important, and in part it depends on what question we’re trying to answer. To some extent, the different sources represent legitimate subquestions about model capabilities - for example, we are interested in understanding model capabilities both given maximal elicitation (e.g. sampling millions of tokens or tens/hundreds of attempts/trajectories for every problem) and given standard/common usage. However, some properties can make the results invalid for most important questions about real-world usefulness—for example, self-reports may be inaccurate and overoptimistic.

Here are a few of the broad categories of hypotheses for how these observations could be reconciled that seem most plausible to us (this is intended to be a very simplified mental model):

AI slows down experienced open-source developers in our RCT, but demonstrates impressive benchmark scores and anecdotally is widely useful.

Summary diagram

Benchmark results and anecdotes are basically correct, and there's some unknown methodological problem or properties of our setting that are different from other important settings.

RCT underestimates diagram

Our RCT results are basically correct, and the benchmark scores and anecdotal reports are overestimates of model capability (possibly each for different reasons)

Benchmarks overestimate capabilities diagram

All three methodologies are basically correct, but are measuring subsets of the "real" task distribution that are more or less challenging for models

Mix diagram

In these sketches, red differences between a source of evidence and the “true” capability level of a model represent measurement error or biases that cause the evidence to be misleading, while blue differences (i.e. in the “Mix” scenario) represent valid differences in what different sources of evidence represent, e.g. if they are simply aiming at different subsets of the distribution of tasks.

Using this framework, we can consider evidence for and against various ways of reconciling these different sources of evidence. For example, our RCT results are less relevant in settings where you can sample hundreds or thousands of trajectories from models, which our developers typically do not try. It also may be the case that there are strong learning effects for AI tools like Cursor that only appear after several hundred hours of usage—our developers typically only use Cursor for a few dozen hours before and during the study. Our results also suggest that AI capabilities may be comparatively lower in settings with very high quality standards, or with many implicit requirements (e.g. relating to documentation, testing coverage, or linting/formatting) that take humans substantial time to learn.

On the other hand, benchmarks may overestimate model capabilities by only measuring performance on well-scoped, algorithmically scorable tasks. And we now have strong evidence that anecdotal reports/estimates of speed-up can be very inaccurate.

No measurement method is perfect—the tasks people want AI systems to complete are diverse, complex, and difficult to rigorously study. There are meaningful tradeoffs between methods, and it will continue to be important to develop and use diverse evaluation methodologies to form a more comprehensive picture of the current state of AI, and where we’re heading.

Going Forward

We’re excited to run similar versions of this study in the future to track trends in speedup (or slowdown) from AI, particularly as this evaluation methodology may be more difficult to game than benchmarks. If AI systems are able to substantially speed up developers in our setting, this could signal rapid acceleration of AI R&D progress generally, which may in turn lead to proliferation risks, breakdowns in safeguards and oversight, or excess centralization of power. This methodology gives complementary evidence to benchmarks, focused on realistic deployment scenarios, which helps us understand AI capabilities and impact more comprehensively compared to relying solely on benchmarks and anecdotal data.

Get in touch!

We’re exploring running experiments like this in other settings—if you’re an open-source developer or company interested in understanding the impact of AI on your work, reach out.

Read the whole story
sarcozona
1 day ago
reply
Fascinating!
Epiphyte City
Share this story
Delete

New US directive for visa applicants turns social media feeds into political documents

1 Share

In recent weeks, the US State Department implemented a policy requiring all university, technical training, or exchange program visa applicants to disclose their social media handles used over the past five years. The policy also requires these applicants to set their profiles to public.

This move is an example of governments treating a person’s digital persona as their political identity. In doing so, they risk punishing lawful expression, targeting minority voices, and redefining who gets to cross borders based on how they behave online.

Anyone seeking one of these visas will have their social media searched for “indications of hostility” towards the citizens, culture or founding principles of the United States. This enhanced vetting is supposed to ensure the US does not admit anyone who may be deemed a threat.

However, this policy changes how a person’s online presence is evaluated in visa applications and raises many ethical concerns. These include concerns around privacy, freedom of expression, and the politicisation of digital identities.

The Trump administration has previously taken aim at higher education with the goal of changing the ideological slant of these institutions, including making changes to international student enrolment and the role of foreign nationals in US research institutions.

Digital rights advocates have expressed concerns this new requirement could lead to self-censorship and hinder freedom of expression.

It is unknown exactly which specific online actions will trigger a visa refusal, as the US government hasn’t disclosed detailed criteria. However, guidance to consular officers indicates that digital behaviour suggesting “hostility” toward the US or its values may be grounds for concern.

Internal advice suggests officers are trained to look for social media content that may reflect extremist views, criminal associations or ideological opposition to the US.

Political ‘passport’

In a sense, this policy turns a visa applicant’s online presence into a kind of political passport. It allows for scrutiny not just of past behaviour but also of ideological views.

Digital identity is not just a technical construct. It carries legal, philosophical and historical weight. It can influence access to rights, recognition and legitimacy, both online and offline.

Once this identity is interpreted by state institutions, it can become a tool for control shaped by institutional whims. Governments justify digital surveillance as a way to spot threats. But research consistently shows it leads to overreach.

A recent report found that US social media monitoring programs have frequently flagged activists and religious minorities. It also found the programs lacked transparency and oversight.

Digital freedom nonprofit Electronic Frontier Foundation has warned these tools risk punishing people for lawful expression or for simply being connected to certain communities.

The US is not alone in integrating digital surveillance into border security. China has implemented social credit systems. And the United Kingdom is exploring digital ID systems for immigration control. There are even calls for Australia to use artificial intelligence to facilitate digital border checks.

The United Nations has raised concerns about the global trend toward digital vetting at borders, especially when used without judicial oversight or transparency.

A free speech issue

These new checks could have a chilling effect on self-expression. This is particularly true for those with views that don’t align with governments or who are from minority backgrounds.

We’ve seen this previously. After whistleblower Edward Snowden revealed widespread use of data gathering by US intelligence agencies, people stopped visiting politically sensitive Wikipedia articles. Not because they were told to, but because they feared being watched.

This policy won’t just affect visa applicants. It could shift how people use social media in general. That’s because there is no clear rulebook for what counts as “acceptable”. And when no one knows where the line is, people self-censor more than is necessary.

What can you do?

If you think you might apply for an affected visa in the future, here are some tips.

1. Audit your social media history now. Old posts, “likes” or follows from years ago may be reviewed and judged out of context. Review your public posts on platforms such as Instagram, Facebook and X. Delete or archive anything that might be misconstrued.

2. Separate personal and professional online identities. Consider keeping distinct accounts for private and public engagement. Use pseudonyms for creative or informal content. Immigration authorities are far less likely to misinterpret context when your online presence is clearly tied to your educational or professional goals.

3. Understand your online visibility and history. Even if you have privacy settings enabled, tagged content, public “likes”, comments and follows can still be seen. Algorithms expose content based on associations, not just what you post. Don’t assume your visibility is limited to your followers.

4. Keep records of any deleted or misinterpreted posts. If you think something might be questioned or if you delete posts ahead of an application, keep a backup. Consular officials may request clarification or evidence. It’s better to be prepared than to be caught off-guard without explanation.

Your social media is no longer a personal space. It may be used by governments to determine whether you fit in.

Read the whole story
sarcozona
1 day ago
reply
Epiphyte City
Share this story
Delete

Nearly 1 in 3 teens have prediabetes, CDC finds, in ‘wake-up call’

1 Share

About 1 in 3 young people who are 12 to 17 years old have prediabetes, new national data show, putting them at risk not just for type 2 diabetes but also for heart disease and stroke. Developing chronic diseases early in life also heightens their chances for worse outcomes from these conditions. 

Experts said the data reflect a concerning rise in obesity among teens, but also noted that not all teens with prediabetes will progress to diabetes.

Read the rest…



Read the whole story
sarcozona
1 day ago
reply
Epiphyte City
Share this story
Delete
Next Page of Stories