Contracted Intelligence

All of this AI speed is the opposite of what is needed,
Bring me 10x quality or get the f… out.
zate75

1. Target: To get LLMs to produce reliable software.

LLMs are capable of producing a lot of code, but those programs are not reliable.

As summarized by this headline:

Amazon called an emergency meeting after outages caused by AI-assisted code changes. The fix: put senior engineers back in the loop – This week in AI News

For the past 50 years we have known that formal methods enable reliable software development, but they remain unpopular in industry due to project execution methods, certification concerns, and the high cost associated with formal methods [2].

The high cost of applying formal methods has made them obvious candidates to try to automate with Generative AI. As shown by the work of [1] and [3] However to the best of our knowledge there has not been work on using Design By Contract and LLMs.

And there are two industry implementations of Design by Contract Eiffel's and ADA with SPARK. Of those two implementations we decided to use Eiffel because of its Simplified Concurrent Object Oriented Programming (SCOOP). Since SCOOP will help with the concurrency and distributed experiments.

So, given that LLMs make developing code cheaper and Formal Methods make developing software higher quality.

On this project we will try to use formal methods, in its Design by Contract incarnation, to guide LLMs creating software. To get: "10x quality software, not just 10x lines of code"

2. Methodology

During the project we will have two groups: Generate by Unit Tests (GenByTests) and Generate by Contracts (GenByContracts) To compare code generated by LLMs with the current state of the practice (following unit tests) vs our proposal (generated by contracts).

2.1. Experiment protocol

Each experiment runs the following procedure:

Specification phase (human-authored):

GenByContracts
A human writes a complete DbC specification for the target library: preconditions, post-conditions, class invariants, and frame rules. No implementation is written.
GenByTests
A human writes a test suite for the same target library, split into a training set (given to the LLM) and a verification set (held out for evaluation). No implementation is written.

Generation phase (LLM-authored)

GenByContracts
An LLM receives the DbC specification and generates a complete Eiffel implementation.
GenByTests
An LLM receives the training-set tests and generates a complete Eiffel implementation.

Evaluation phase

Both implementations are evaluated on the same axes:

Cost
API expense to produce a compiling, contract-passing (GenByContracts) or training-set-passing (GenByTests) implementation.
Time
Time required to produce a compiling implementation on each group.
Quality
performance on the held-out verification set, which neither group has seen during generation.

This design ensures the verification set is an unbiased judge: it tests behavioral correctness that was never explicitly provided to either LLM.

2.2. Experiment Iterations

For our experiments we will develop an implementation of Weiher's [6] Architecture Oriented Programming (AOP) library in Eiffel.

In 4 iterations each iteration with a bigger level of complexity.

The four experiments are:

Baseline
Implement (with GenByContracts and GenByTests) known data-structures with known contracts. Like stack, queue or hash-map.
AOP - Single
In a single threaded environment, implement Weiher's six patterns
AOP - Concurrent
Implement the same patterns now in a multi-threaded processor. For this we will take advantage of Eiffel SCOOP (Simplified Concurrent Object Oriented Programming) [4]
AOP - Distributed
Implement the same patterns now in a distributed system. For this we will build on the thesis [5]

experiment_protocol.svg

Figure 1: Experiment protocol, for 4 experiments

2.3. What is AOP? and Why will it be our library example case?

AOP has a simple but powerful idea:

What would happen if we took the architecture of large software systems, (like HTTP and the web) and shrunk it to the level of a single program?

With this principle Weiher developed the Objective-Smalltalk programming language, but it remains the only implementation and AOP is only documented on his corpus of work. Not specified in an executable format.

The six AOP patterns that Weiher documents are:

Table 1: Six patterns of AOP
Architectural Inspiration AOP Pattern
Internet URIs Polymorphic Identifiers [8]
Internet Protocols Schemes [8]
Spreadsheet cell references References [8]
Unix pipes and filters Polymorphic Write Streams [9]
REST / stackable file-systems Storage Combinators [10]
Constraint programming Constraints as Polymorphic Connectors [7]

We selected this application because:

  • It is complex enough, it is not a trivial case.
  • It is documented in the academic literature, but there is only one implementation so we can assume LLMs haven't been trained on how to develop it.
  • It is an ideal candidate to be specified with Design by Contract by virtue of being inspired by HTTP Protocol that was inspired by DbC.
  • The resulting library would be valuable on its own.

3. Budget

Table 2: Indicative Budget
Item Description Count Unit Cost Subtotal
PhD stipend Researcher salary 36 months mxn $ 21,397.32 mxn $ 770,303.52
LLM API access 4 experiments × Groups GenByContracts + GenByTests 9 months USD $ 200 / month mxn $ 64,297.08
Eiffel Studio Academic license 0   mxn $ 0
Conference travel 4 presentations (ICSE / FM / VerifyAI) 4 conferences EUR $690 + USD $1884 mxn $ 191,462.18
Article Processing Fees Will publish in Diamond Open Access journals (https://programming-journal.org , https://www.jot.fm) 4   mxn $ 0
Computing infrastructure CIMAT provision     mxn $ 0
      TOTAL mxn $1,026,052.56

4. Status

I have been working on an Eiffel AOP implementation since May 2025. During that time I became convinced that AOP is good paradigm to copy in other patterns. But that I would need to speed up my productivity if I ever wanted to finish it. That is why now I will focus on getting Generative AI to produce a correct implementation of AOP guided by contracts. So I have already the human generated contracts that will become the basis for the second experiment AOP - Single.

5. Participants

PhD Candidate
Alejandro García F., Msc. in Software Engineering from CIMAT Zacatecas Software engineer with over 30 years of experience. And the originator of the Contracted Intelligence approach, has prior published work on Software Engineering, and brings experience in formal methods, literate programming, and technical writing to the project.
Research Supervisor
Dr. Carlos Lara-Alvarez, Director of CIMAT Zacatecas, holds SNI Level I and has 537 academic citations with industry collaborations at Intel and commercial software firms on safety-critical systems.

His expertise in probabilistic models, software testing, and human-machine systems directly covers the experimental design and evaluation methodology of this project.

6. Results and Impact

The project will produce empirical evidence comparing contract-guided vs test-guided LLM code generation, including:

  • A validated methodology for using DbC to guide LLMs in generating reliable software
  • Quantitative measurements (cost, time, defect rates) comparing both approaches across four complexity levels
  • A complete, working implementation of Weiher's AOP library in Eiffel (valuable independently)
  • Four peer-reviewed publications documenting findings at each experiment iteration

Delivering these results will enable software engineering teams to adopt contract-guided AI coding practices with evidence-based confidence, and provide AI tool developers with design principles for integrating formal specifications into code generation workflows.

7. Timeline

timeline.svg

Figure 2: The project will last for 36 months

8. What I am looking for

To be accepted in the DOCTORADO EN CIENCIAS FÍSICO MATEMÁTICAS CON ORIENTACIÓN EN PROCESAMIENTO DIGITAL DE SEÑALES INTELIGENCIA ARTIFICIAL


Alejandro García F. agarciafdz@gmail.com <2026-04-02 jue>

9. References

[1]
[2]
Davis, J.A. et al. 2013. Study on the barriers to the industrial adoption of formal methods. International workshop on formal methods for industrial critical systems (2013), 63–77.
[3]
Ma, L. et al. 2025. Specgen: Automated generation of formal program specifications via large language models. 2025 Ieee/acm 47th international conference on software engineering (icse) (2025), 16–28.
[4]
Morandi, B. et al. 2008. A formal reference for scoop. Laser summer school on software engineering. Springer. 89–157.
[5]
Schmocker, R. 2014. Concurrency patterns in scoop. (2014).
[6]
Weiher, M. 2020. Can programmers escape the gentle tyranny of call/return? Companion proceedings of the 4th international conference on art, science, and engineering of programming (2020).
[7]
Weiher, M. and Hirschfeld, R. 2016. Constraints as polymorphic connectors. Proceedings of the 15th international conference on modularity (2016).
[8]
Weiher, M. and Hirschfeld, R. 2013. Polymorphic identifiers: Uniform resource access in objective-smalltalk. Dynamic languages symposium (2013).
[9]
Weiher, M. and Hirschfeld, R. 2019. Standard object out: Streaming objects with polymorphic write streams. Proceedings of the 15th acm sigplan international symposium on dynamic languages (2019).
[10]
Weiher, M. and Hirschfeld, R. 2019. Storage combinators. Proceedings of the 2019 acm sigplan international symposium on new ideas, new paradigms, and reflections on programming and software (2019).