Guides

Document Lifecycle

How current internal OCR states map to the future public document status model in the Numora Public API developer preview.

Scope

This page explains how a document moves through the Numora pipeline.

It also distinguishes between:

  • The current internal OCR states already used by the existing app.
  • The future normalized document.status values exposed by the public API.

Why Two Status Layers Exist

The current Numora app stores OCR-oriented internal states that reflect the extraction pipeline directly.

The public API should expose a smaller and more stable status model for external developers. That public model should be easy to understand without leaking internal implementation details.

Current Internal OCR States

The current app uses these internal OCR states:

  • pending
  • processing
  • for_review
  • reviewed
  • failed

These values describe the internal extraction and review flow used by the existing OCR subsystem.

Public Document Status Model

The public API should normalize those internal states into the following document.status values:

  • processing
  • review_required
  • reviewed
  • push_pending
  • push_succeeded
  • push_failed
  • failed

The first three values are enough for the initial public document lifecycle. The push_* states become relevant when the public document resource also reflects downstream write-back state.

Status Mapping

Current internal OCR stateFuture public document.statusMeaning
pendingprocessingThe document has been accepted, but extraction work has not completed yet.
processingprocessingExtraction or background processing is still running.
for_reviewreview_requiredExtraction finished and the document is waiting for human review or approval.
reviewedreviewedHuman review is complete and the extracted result has been confirmed.
failedfailedExtraction or a required processing step failed.

Future Write-Back Extension

When the public API starts exposing downstream delivery state on the document resource, the reviewed state may later expand into:

  • push_pending
  • push_succeeded
  • push_failed

Those values should describe destination delivery state, not OCR extraction state.

Extraction and review

The most common lifecycle is:

pending -> processing -> for_review -> reviewed

Public API view:

processing -> review_required -> reviewed

Extraction failure

If extraction fails:

pending -> processing -> failed

Public API view:

processing -> failed

Review complete and downstream delivery

If the document is later coupled to write-back execution:

pending -> processing -> for_review -> reviewed -> push_pending -> push_succeeded

or:

pending -> processing -> for_review -> reviewed -> push_pending -> push_failed

Implementation Notes

  • The current OCR subsystem treats for_review, reviewed, and failed as terminal OCR states.
  • The public API should not expose internal OCR state names directly if a more stable normalized status already exists.
  • Public docs should describe review_required as the external equivalent of the internal for_review state.