P0: Aligning AGQ With EXECUTION-LAYERS Spec

by Alex Johnson 44 views

In the realm of software development, maintaining consistent terminology and APIs is crucial for seamless integration and understanding. This article delves into the critical task of aligning the nomenclature and API of AGQ (a component within a larger system) with the authoritative EXECUTION-LAYERS specification. This alignment is designated as a P0 priority, signifying its importance in unblocking Phase 1 completion and ensuring the overall coherence of the system.

Current State vs. Spec: A Deep Dive into Terminology Misalignment

The initial step in achieving alignment is to meticulously compare the current state of AGQ with the definitions outlined in the EXECUTION-LAYERS specification. This comparison reveals instances of terminology misalignment that need to be addressed. Inconsistent terminology can lead to confusion, errors, and increased development time, making it imperative to rectify these discrepancies.

In the current AGQ implementation, the term "Job", as defined in JOB_SCHEMA.md, actually represents a Plan. According to the specification, a Plan is an ordered list of Tasks generated by AGX (another component in the system). Conversely, what is currently referred to as a "Step" should be termed a Task, which represents an atomic execution unit, such as a single tool or AU call. Furthermore, the concept of a "Job" itself is missing in the current AGQ implementation. According to the spec, a Job is a runtime instance of a Plan, tracked by AGQ. Similarly, the concept of an "Action", representing many Jobs applying the same Plan to different inputs, is also absent.

To better understand these misalignments, consider the following table:

Current AGQ Term Should Be Definition (per spec)
"Job" (in JOB_SCHEMA.md) Plan Ordered list of Tasks (generated by AGX)
"Step" Task Atomic execution unit (single tool/AU call)
N/A Job Runtime instance of a Plan (tracked by AGQ)
N/A Action Many Jobs applying same Plan to different inputs

Addressing API Endpoints: Filling the Gaps

In addition to terminology discrepancies, there are also gaps in the API endpoints provided by AGQ. Currently, AGQ only offers a generic list operation (LPUSH queue:ready {json}). To align with the specification, the following endpoints are required:

  1. PLAN.SUBMIT: This endpoint is responsible for storing a Plan definition. AGX submits reusable Plan templates to AGQ, which then stores them for reuse, returning a plan_id.
  2. ACTION.SUBMIT: This endpoint facilitates the creation of Jobs from a Plan. AGX submits an Action, which includes a Plan and input data. This results in the creation of one or more Jobs, which are then enqueued to queue:ready. The endpoint returns an action_id and the job_id(s) of the created Jobs.
  3. JOB.STATUS: This endpoint allows AGX to monitor the execution state of running Jobs, providing information such as status, timestamps, and logs. This endpoint requires hash storage capabilities (AGQ-006).

Required Changes: A Roadmap to Alignment

To achieve the desired alignment, several changes are required across different components of the system. These changes can be broadly categorized into nomenclature updates, API endpoint implementation, and expectation adjustments.

1. Updating JOB_SCHEMA.mdPLAN_SCHEMA.md

The first step is to rename the file agx/docs/JOB_SCHEMA.md to agx/docs/PLAN_SCHEMA.md. This renaming reflects the fact that the current structure actually represents a Plan, not a Job. Furthermore, the steps field within the schema should be renamed to tasks to align with the correct terminology. The job_id field should be removed, as Jobs are created by AGQ, not AGX. Finally, the documentation should be clarified to emphasize that this schema represents a Plan definition.

Consider the following example of the current structure:

{
  "plan_id": "uuid-5678",      // ✅ Correct - identifies the Plan
  "plan_description": "...",    // ✅ Correct
  "steps": [...]                // ❌ Should be "tasks"
}

2. Implementing the PLAN.SUBMIT Endpoint

AGQ needs to implement the PLAN.SUBMIT endpoint, which allows AGX to submit Plan definitions for storage and reuse. The endpoint should accept a plan_json as input and return a plan_id. The Plan definition should be stored in AGQ for later retrieval.

The storage mechanism can utilize a STRING table, where plan:{plan_id} maps to the plan_json. Alternatively, a new PLAN_TABLE can be created in redb.

3. Implementing the ACTION.SUBMIT Endpoint

AGQ also needs to implement the ACTION.SUBMIT endpoint, which enables AGX to create Jobs from a Plan and enqueue them for execution. This endpoint should accept an action_json as input and return an action_id and a list of job_ids.

The action_json should include the action_id, the plan_id (referencing the stored Plan), and an array of inputs for fan-out parallelism. Here's an example:

{
  "action_id": "uuid-1234",
  "plan_id": "uuid-5678",        // References stored Plan
  "inputs": [                     // Array for fan-out parallelism
    {"data": "input1.txt"},
    {"data": "input2.txt"}
  ]
}

The logic for this endpoint involves loading the Plan from plan:{plan_id}, generating a job_id for each input, creating Job metadata with a status of "pending", and then enqueuing the Job to queue:ready. The Action metadata should also be stored.

4. Updating AGW Expectations

AGW (another component in the system) needs to be updated to expect Jobs rather than Plans. Workers receive Jobs via BRPOP, and the Job envelope should include the job_id (created by AGQ), the action_id (parent Action), the embedded Plan definition (including the renamed tasks field), and the input specific to that Job.

Here's an example of the expected Job envelope:

{
  "job_id": "uuid-9999",         // Created by AGQ
  "action_id": "uuid-1234",      // Parent Action
  "plan": {                       // Embedded Plan definition
    "plan_id": "uuid-5678",
    "tasks": [...]                // Renamed from steps
  },
  "input": {"data": "input1.txt"} // Specific to this Job
}

5. Implementing the JOB.STATUS Endpoint

Finally, AGQ needs to implement the JOB.STATUS endpoint, which allows AGX to query the execution state of a Job. This endpoint should accept a job_id as input and return the Job's status, start time, completion time, standard output, and standard error.

This implementation depends on the availability of HASH operations (AGQ-006) for storing Job metadata.

Implementation Order: A Phased Approach

To ensure a smooth transition and minimize disruption, the implementation should follow a phased approach:

  1. Phase 1: Update nomenclature in documentation.
    • Rename JOB_SCHEMA.mdPLAN_SCHEMA.md (in agx repo).
    • Update stepstasks throughout the documentation.
    • Document the distinction between Job, Plan, and Action.
  2. Phase 2: Implement storage in AGQ.
    • Complete AGQ-006 (hash operations).
    • Add Plan storage table.
    • Add Action metadata storage.
    • Add Job metadata storage.
  3. Phase 3: Implement new endpoints in AGQ.
    • PLAN.SUBMIT - Store Plans.
    • ACTION.SUBMIT - Create Jobs.
    • JOB.STATUS - Query Job state.
    • JOB.LIST - List Jobs for an Action.
  4. Phase 4: Update consumers.
    • AGX sends PLAN.SUBMIT, then ACTION.SUBMIT.
    • AGW expects Job envelope (not raw Plan).
    • Update E2E tests.

Acceptance Criteria: Measuring Success

The successful alignment of nomenclature and API can be measured against the following acceptance criteria:

  • All code, documentation, and APIs use canonical nomenclature.
  • AGX can submit Plans via PLAN.SUBMIT.
  • AGX can create Actions via ACTION.SUBMIT.
  • AGQ creates Jobs and enqueues them.
  • AGW receives Jobs (not Plans) via BRPOP.
  • AGX can query Job status via JOB.STATUS.
  • E2E test demonstrates the full flow:
    • AGX → PLAN.SUBMIT
    • AGX → ACTION.SUBMIT
    • AGW → BRPOP (receives Job)
    • AGW → executes Tasks
    • AGX → JOB.STATUS (monitors)

Dependencies and Related Issues

This alignment effort is a critical dependency for Phase 1 completion and production readiness. It also depends on the completion of AGQ-006 (hash operations for Job metadata). Related issues include AGQ-011 (RPOPLPUSH for queue transitions), AGQ-012 (failed job handling), and AGQ-014 (plan storage & versioning).

References for Further Exploration

For more in-depth information, refer to the following resources:

  • Authoritative spec: agx/docs/EXECUTION-LAYERS.md
  • Current schema: agx/docs/JOB_SCHEMA.md (needs renaming)
  • AGQ roadmap: ROADMAP.md section 2.2

By addressing the terminology misalignments and implementing the required API endpoints, AGQ can be brought into full compliance with the EXECUTION-LAYERS specification, paving the way for a more robust and consistent system. You can find more information about API Design on this website.