·Technology·5 min read

What does AI need to be really effetive?

Perspectives on the origin of the gaps between human production and AI output.

If a lion could speak, we could not understand him
Ludwig Wittgenstein

The vision of the future in the 1920's included flying cars with gyroscope stabilization, and collapsible blades that allowed the vehicle to transition between road travel and aerial flight. These machines would be constructed from lightweight, fireproof materials and potentially powered by wireless electricity. These ideas have come and gone like meteors, what has remained is the pattern of waves of hype and promises of an almost mytical future of ease and fortune that, looking back at history, it has never realized itself.

This type of overconfidence about the future seems to be a feature, not a bug, so there's nothing new here. Still, there is a complex relationship, one I can't quite put my finger on, between our capacity to create abstractions and fall infatuated with our own mental creations, versus our legitimate search for knowledge and improvement.

Epistemic gaps

Our mental models are not necessarily the truth nor true representations of reality. There's a gap between the two and some people are more receptive to this difference than others.

The AI-positivists propose the idea that AI produces outputs that can, in sufficiently advanced cases, be treated as epistemically sufficient and not requiring significant human interpretation to have practical value. The most convinced supporters would even conclude that subjective interpretation, theory, or human judgment becomes secondary.

The physicalist interpretation would lead us to the sober understanding of AI as a system generating token sequences through statistical optimization. The Output does not by itself constitute knowledge or truth, which requires human interpretation and validation. The accuracy of the output is a property of the optimization process, not an intrinsic quality of the system.

The point of disagreement can be summarized as follows: AI-positivism (as defined above) sees generated outputs as epistemically sufficient, while physicalism would see those outputs as dependent on human interpretation in order for them to have epistemic content.

The idea is that if we’re unwilling to grant AI-generated output the authority of knowledge, it seems counterintuitive to expect it to replace humans. We tend to think that words have meaning when spoken by a human and generated through a voluntary process. On the contrary, words plucked from a probability distribution are just symbols, why would they have any weight? Meaning seems to be firmly in the field of human interpretation and it doesn't seem to be reducible to an interface, a button, low latency measurements and other benchmarks.

Micro productivity, Macro Mess

While coding assistants can make (some) specific tasks easier, that sliver of work lives within the context of a complex organization. The complexity is such that the increase in productivity is largely bottlenecked by managerial funnels, by hard project constraints such as compliance, or technicalities that require human attention such as security, cloud infrastructure or software architecture.

Solving problems we cannot define

To put it simply, AI is most effective when used to solve problems we're already quite good at defining. When goals are ambiguous and definitions are vague neither AI nor humans perform well. It's precisely the difficult problems we want AI to solve that we can't define sufficiently well, nor we can define what the solution to those problems would look like. In the end it's humans determining what problems are worth solving and what "success" looks like.

Conclusion

Integrating AI effectively is still difficult. Interacting with AI is still quite dull, Its output feels dry and ineffective when put in complex human contexts which are rich of interpretations and interactions. The difficulty to capture exactly what's missing is probably due to the fact that it cannot be reduced to easily measurable quantities, such as accuracy, natural language fluency, retrieval accuracy and other benchmark scores. The missing dimensions are more like human-like: situational awareness, shared meaning, cultural presuppositions, contextual grounding and more.