Lavoro
I miei annunci
Le mie notifiche
Accedi
Trovare un lavoro Consigli per cercare lavoro Schede aziende Descrizione del lavoro
Cerca

Senior data engineer

Ancora
Pubblicato il 21 marzo
Descrizione

Ancora is building an AI-native accounting software that replaces traditional accounting management systems. We're not improving the status quo — we're replacing it entirely: from software that waits for human input to an autonomous agent that performs accounting under professional supervision.



È questo il ruolo che sta cercando? Se sì, continui a leggere per maggiori dettagli e si ricordi di candidarsi oggi stesso.

We're an Italian startup based in Milan. Our model combines a technology platform that automates the operational work these firms do every day: bookkeeping, tax filings, document management, compliance deadlines with a roll-up strategy — acquiring and consolidating accounting firms (studi commercialisti) across Italy.


The vision is clear: free accounting professionals from repetitive tasks so they can focus on what actually requires human judgment — strategic consulting, client relationships, and growing their practice. We're building the infrastructure that makes this possible at scale.


Where we are today. We're venture-backed by some of Italy's best investors, with our first studio acquisitions underway. The product is greenfield — zero legacy code, modern stack, built from scratch. Our engineering team is designing the entire architecture now. The decisions we make today will shape the system for years.


What makes Ancora different. We own the problem end-to-end: we build the technology, we acquire the firms, we operate the service. This means we control the feedback loop between what accountants need and what we build. We're not selling software to reluctant buyers — we're building it for firms we operate. The technology has to work because our business depends on it. Italian accounting is deeply regulated, complex, and largely untouched by modern software. The domain complexity is real, and that's what makes it interesting.

About the Role

As our Data Engineer, you'll design and build the data infrastructure that powers our system: from ingestion pipelines that handle heterogeneous document formats, to the RAG and knowledge graph architecture that enables intelligent retrieval and reasoning.


The technical challenges. You'll tackle temporal versioning at scale — tracking how authoritative documents evolve over time with complex effective dates, retroactive changes, and transitional provisions. You'll parse natural language amendments, extracting structured diffs from modifications like "replace X with Y in paragraph 3" and reconstructing consolidated versions programmatically. You'll build multi-layer knowledge graphs connecting source documents to their interpretations, amendments, and operational mappings, preserving semantic authority levels across the graph. You'll normalize heterogeneous sources — ingesting from dozens of formats (PDFs, HTML, XML, scanned documents) with no standardized structure — into a unified, queryable corpus. And you'll design context-dependent retrieval systems where the correct answer depends not just on the query, but on multi-dimensional context: time, jurisdiction, entity profile.

What You'll Do


Build the data ingestion and normalization infrastructure. Design multi-format ingestion pipelines (PDFs with OCR, HTML, XML, scanned documents) from heterogeneous sources. Transform documents into a unified schema while preserving semantic distinctions — authority levels, document types, versioning metadata. Handle edge cases: implicit cross-references, evolving formats, natural-language amendments, missing metadata. Build validation pipelines to catch ingestion errors and monitor source freshness.


Own the RAG and knowledge graph architecture. Design hierarchical RAG systems with chunk-level embeddings, document-level summaries, and cross-document relationship modeling. Construct knowledge graphs connecting source documents to their interpretations, amendments, and operational mappings. Extract relationships from complex text — references, hierarchies, temporal dependencies — using NLP and LLM-based approaches. Implement temporal versioning with complex effective date logic and retroactive change tracking.

Ensure data quality and enable downstream AI. Build tooling for data quality audits, anomaly detection, and validation at scale. Provide retrieval APIs (RAG + graph queries) for the reasoning engine to consume. Design systems where retrieval accuracy directly determines AI agent correctness.


What We're Looking For


Must Have

* 3+ years of experience building production data pipelines (ETL/ELT).
* Strong proficiency in Python — our primary language for data work.
* Experience with data extraction from messy sources — PDFs, HTML scraping, document parsing.
* Hands-on experience with data orchestration tools (Airflow, Prefect, Dagster, or similar).
* Solid understanding of data modeling and schema design.
* Experience with SQL and NoSQL databases (Postgres, MongoDB, or similar).
* Ability to write robust, testable, maintainable code.
* Comfort working with ambiguity and iterating on solutions.


Nice to Have

* Cloud environments (AWS preferred) and infrastructure as code (Terraform).
* RAG systems — vector databases (Pinecone, Weaviate, Qdrant), embedding models, retrieval strategies.
* Graph databases (Neo4j, Neptune) and graph query languages (Cypher, Gremlin).
* OCR pipelines (Tesseract, cloud OCR services).
* NLP for information extraction — entity recognition, relationship extraction, LLM-based parsing.
* Regulated industries (legal, finance, healthcare) where data quality is critical.
* MLOps practices — feature stores, data versioning (DVC), model monitoring.


Mindset

* Detail-oriented. You understand that edge cases matter when dealing with complex documents.
* Pragmatic. You balance "perfect" with "good enough to ship" and iterate.
* Curious. You want to understand the domain, not just move data around.
* Collaborative. You work closely with ML engineers, backend engineers, and domain experts.


Why This Role is Interesting

* Foundational impact. Your work directly determines whether AI agents can operate autonomously in a regulated domain. No data quality = no intelligence.
* Unique technical challenges. Build temporal knowledge graphs for constantly-evolving authoritative documents, parse natural-language amendments into structured diffs, design context-dependent RAG systems — problems with no existing solutions.
* Greenfield with modern practices. Zero legacy code, design the data architecture from scratch, build with best-in-class tools: vector DBs, graph databases, modern orchestration.
* Full-stack ownership. Own everything from raw PDFs to the knowledge infrastructure that powers AI reasoning, working directly with the founding team.


What We Offer

* Office First. Collaboration is easier and more effective in person in our Milan HQ. You can also enjoy working from home up to 30% of the time, while enjoying great company during our three core days in the office.
* Compensation. €50,000 – €75,000 gross annual salary, based on experience.
* Equity. Meaningful stock options package, based on experience and scope. Equity is offered with 0 strike price and a strong upside potential given stage of the company
* Benefits. Meal vouchers and fringe benefits.
* Team. Direct collaboration with domain experts and the founding team.
* Autonomy. Strong voice in technical decisions, from data architecture to tooling choices.
* Growth. Opportunity to build from zero a critical piece of infrastructure as the team scales.


How to Apply

Send your CV to with a brief note on why this role interests you.

We're an equal opportunity employer. xjrgpwk We value diversity and encourage applications from people of all backgrounds.

Rispondere all'offerta
Crea una notifica
Notifica attivata
Salvato
Salva
Offerta simile
Casting tfp — per chi sa ancora rispondere a un messaggio.
Lecce
FOTO UROBORO
Offerta simile
Sales assistant 20h - porto sant'elpidio cc le ancore
Porto Sant'Elpidio
Altro
Commesso
Offerta simile
Tatuatore per fare un tatuaggio su devo ancora decidere (vercelli)
Vercelli
Cronoshare.it
Tatuatore
Offerte simili
Home > Lavoro > Senior Data Engineer

Jobijoba

  • Consigli per il lavoro
  • Recensioni Aziende

Trova degli annunci

  • Annunci per professione
  • Annunci per settore
  • Annunci per azienda
  • Annunci per località

Contatti/Partnerships

  • Contatti
  • Pubblicate le vostre offerte su Jobijoba

Note legali - Condizioni generali d'utilizzo - Politica della Privacy - Gestisci i miei cookie - Accessibilità: Non conforme

© 2026 Jobijoba - Tutti i diritti riservati

Rispondere all'offerta
Crea una notifica
Notifica attivata
Salvato
Salva