Our AI work starts where the demo ends: getting models out of a notebook and into a workflow that customers, employees or regulators actually depend on. That means designing for evaluation, observability and the boring parts of operating an AI system — not just the moment a model gives a good answer.
The pattern we keep seeing is this: building the first model is the easiest 20% of the engagement. Operating it through a year of distribution shift, prompt updates, model upgrades and regulatory scrutiny is the rest. We size programmes accordingly and we are honest with clients about where the cost sits.
What we build
Generative AI applications
LLM-powered products for customer support, internal knowledge, contract review, claims, underwriting and content operations. We choose the model and the retrieval pattern that fits the data and the risk profile — not whatever was in the keynote last month.
Most of these systems live or die on retrieval quality and on the evaluation suite that catches regressions before users see them. We pay disproportionate attention to both, and we resist the temptation to demo the model alone when the whole pipeline is what matters.
AI agents
Multi-step agents that take action against your systems — opening tickets, drafting responses, routing approvals, reconciling data. Built with explicit tool definitions, audit trails and human-in-the-loop where the stakes warrant it.
We are practical about where agents earn their place. For routine multi-step work with clear tool contracts and recoverable failure modes, they pay back quickly. For ambiguous, high-stakes decisions we keep humans in the loop and the agent suggests rather than acts.
Computer vision for the physical world
Vision systems that classify, sort and grade materials in industrial settings — from material identification on conveyors and pick-lines, to defect detection on production runs, to robotic-arm sorting that adjusts dynamically to upstream conditions. Trained on real plant data, deployed at the edge, monitored centrally.
The tricky parts are not the models. They are the camera placement, the lighting, the dataset that reflects real failure modes, and the retraining schedule when the upstream feed changes. We engineer for all of that as a first-class concern.
Retrieval, search and grounding
Hybrid retrieval (vector + lexical + structured), document chunking and re-ranking pipelines that consistently surface the right context. Source citations are non-negotiable — every answer must be traceable back to the documents that produced it.
For regulated environments we go further: retrieval over signed source corpora, prompt-injection defences on retrieved content, and explicit caveats when the model is answering from out-of-scope material.
Decision systems integrating market and operational data
AI that combines real-time operational telemetry with external signal — commodity benchmarks, weather, demand forecasts — to make process decisions that hold up financially as well as technically. Used to tune feedstock mix, balance throughput against margin, and prioritise output streams against current pricing.
Done well these systems quietly raise the margin floor of an operation. Done badly they make confident decisions on stale data. We invest heavily in data freshness, feature lineage and a clear separation between recommendation and autonomous action.
Evals and guardrails
Continuous evaluation suites, red-teaming, prompt-injection and PII defences. Production AI systems have a CI suite of their own — ours run on every model upgrade, every prompt change and every retrieval-index rebuild.
We treat evals as product features, not as testing afterthoughts. They are versioned, owned by named engineers, and reviewed at the same cadence as the application code that depends on them.
MLOps and inference
Versioned prompts, model and dataset registries, drift detection and cost monitoring. We size and operate inference on the right substrate for the workload — managed APIs, hosted models, or self-managed GPU clusters — based on latency, sovereignty and unit economics.
The fastest way to overspend on AI is to leave inference unmonitored. We instrument every call, attribute cost back to the feature that drove it, and set explicit budgets that engineers see in their dashboards.
Responsible AI
Policy review, model cards, bias and fairness testing, and the documentation regulators are starting to ask for. Most clients want a defensible position more than a flashy product, and that is what we deliver.
We are familiar with the EU AI Act, NIST AI RMF, and the equivalent guidance emerging in the UK, US states and APAC. Compliance is real work, but it overlaps heavily with good engineering hygiene — and we build to that overlap.
Where we ship
- Financial services — claims, underwriting, contracts, complaints handling.
- Healthcare and life sciences — clinical knowledge, prior-auth, trial document review.
- Manufacturing and industrial — vision-based sorting, defect detection, plant operations co-pilots.
- Energy and utilities — generation forecasting, asset health, market-aware dispatch.
- Retail and consumer — search, merchandising, support automation.
- Public sector — case management, citizen knowledge bases, eligibility assistance.
Most engagements begin with a focused diagnostic — what the operating model needs, what the data supports, what the regulatory line is. The diagnostic takes weeks, not quarters, and the answer it produces is sometimes that the right move is not AI at all. That is part of the value.