Streamline Your Tests: Cut Redundancy, Boost Coverage
In the fast-paced world of software development, a robust and efficient testing strategy is your best friend. It's not just about having tests; it's about having the right tests. Today, we're diving deep into a comprehensive test optimization plan designed to slash unnecessary tests and bolster critical end-to-end (E2E) validation. Our goal? To refine a sprawling test suite of 145 tests, currently hovering around 50% coverage with significant redundancy, down to a lean and mean 115 tests boasting over 95% coverage. This isn't just about reducing numbers; it's about elevating quality and ensuring our core functionalities are rock-solid.
We'll tackle this by first ruthlessly eliminating 45 redundant unit tests that offer little value or are already covered by E2E scenarios. Then, we'll strategically introduce 15 new P0/P1 critical E2E tests, focusing on high-priority requirements (REQ-001, REQ-002, REQ-009, REQ-010, REQ-011, REQ-012). This methodical approach ensures we spend our testing efforts where they matter most, leading to faster feedback cycles and more reliable software. Let's get started on transforming our test suite from a cluttered burden into a powerful asset.
📋 Issue Overview
The objective of this optimization effort is clear: streamline the test suite by removing 45 redundant unit tests and adding 15 crucial P0/P1 end-to-end (E2E) tests. Currently, we're managing 145 tests, but the coverage is only around 50%, indicating a significant amount of redundancy. Our target state after this optimization is a leaner suite of 115 tests that achieve over 95% coverage, with a sharp focus on E2E validation. This shift is directly linked to several high-priority requirements: REQ-001 (P0), REQ-002 (P0), REQ-009 (P1), REQ-010 (P1), REQ-011 (P1), and REQ-012 (P0). By reducing redundancy and focusing on E2E tests for these critical requirements, we aim to significantly improve the reliability and maintainability of our codebase. This isn't just about cleaning up; it's a strategic move to ensure our most important features are rigorously tested and validated at the system level, where it counts the most. We're moving from quantity to quality, ensuring that each test serves a vital purpose in our quality assurance process.
🎯 Phase 1: Deleting Redundant Tests (45 tests)
This initial phase is all about decluttering. We're going to surgically remove tests that have become obsolete, are covered elsewhere, or simply don't add significant value. Think of it as spring cleaning for our test suite!
Task 1.1: Delete the entire mcp_core_methods.rs file (18 tests)
Reason for Deletion: These are purely logical unit tests. The functionality they cover is already thoroughly validated by our E2E tests. Maintaining these unit tests provides diminishing returns and adds to the maintenance overhead without a proportional increase in confidence. By removing them, we eliminate duplication of effort and focus our resources on tests that provide more comprehensive system-level validation. The code coverage these tests provide is superficial compared to the real-world scenarios tested by E2E suites. Therefore, the decision to remove this entire file is based on a clear understanding that its testing value is now redundant.
rm tests/mcp_core_methods.rs
Verification: Running cargo test should show no failures. This confirms that the removal of these tests does not impact the overall health of our test suite, as their validation value was minimal and already accounted for elsewhere.
Task 1.2: Optimize provider_config.rs (Delete 2, Keep 1)
Tests to Delete:
test_create_default_provider_template: This test focuses on the generation of configuration templates. Such functionality is inherently part of the E2E setup and validation of providers. Testing its internal logic at the unit level is redundant when its correct behavior is confirmed during E2E runs.test_provider_summary_includes_flags: This test appears to be primarily concerned with string formatting or output presentation. While correctness in output is important, the core logic of provider summarization is better validated in an E2E context where the actual impact of these flags can be observed. Unit tests focused on minor formatting details can become brittle and add little value.
Test to Keep:
test_provider_env_var_generation: This test likely verifies a core, isolated piece of logic related to how environment variables are generated for providers. This isolation makes it a good candidate for a focused unit test, ensuring this specific mechanism works correctly before it's used in broader E2E scenarios. It represents a critical piece of the provider configuration logic that warrants direct unit-level verification.
Implementation:
// tests/provider_config.rs
// Delete the two functions mentioned above, keeping only:
#[test]
fn test_provider_env_var_generation() {
// This test remains as is
}
By keeping test_provider_env_var_generation, we retain a valuable unit test for a specific, critical piece of logic, while removing the redundant tests related to template generation and string formatting.
Task 1.3: Optimize cli_parser.rs (Delete 8, Keep 6)
This task involves refining the tests for our command-line interface (CLI) parser. Many of these tests inadvertently focus on the underlying library's behavior (like Clap) rather than our application's specific logic. We'll remove tests that validate generic library features and retain those that confirm our application's unique command-line handling.
Tests to Delete:
defaults_to_dashboard_when_no_subcommand_given: This validates the default behavior ofClapwhen no subcommand is provided. This is a core feature of the library itself and doesn't need to be tested at our application level.parses_push_with_multiple_directories: Tests array parsing, another common feature of argument parsing libraries. Our focus should be on the semantics of our commands, not the library's parsing mechanics.parses_help_with_optional_topic: ValidatesClap's help message generation. This is standard functionality and doesn't require application-specific testing.fail_when_provider_flag_missing_value: TestsClap's validation rules. While important, the focus should be on ensuring our commands fail meaningfully rather than just howClapreports the failure.returns_error_when_no_external_tokens_are_provided: Verifies edge-case validation. This level of detail is often better handled by integration tests that simulate user input and check the overall outcome.invalid_flag_is_treated_as_prompt_token: This seems to delve into the interpretation of flags, which might be more of an internal CLI library concern.try_parse_fails_for_unknown_top_level_flag: TestsClap's error handling for unknown flags. Again, a library-level behavior.parses_update_command_with_tool_name: Tests enum parsing withClap. Standard library functionality.
Tests to Keep:
These tests focus on the integration of our commands and subcommands, ensuring that the CLI parser correctly translates user intent into application actions. They validate the structure and flow of our specific CLI commands:
parses_status_and_provider_commandscaptures_external_subcommands_for_ai_cliparse_external_ai_cli_argumentsparses_roles_list_commandparses_update_command_with_no_toolparses_roles_list_command(Note: This appears duplicated in the original list, assuming one is intended)
By retaining these, we ensure our CLI behaves as expected from the user's perspective, correctly interpreting commands related to status, providers, external AI CLIs, and roles.
Task 1.4: Optimize workflow_orchestration_tests.rs (Delete 5, Keep 6)
Workflows involve multiple steps and potential failure points. We want to ensure our tests capture the critical paths and error handling, rather than focusing on minor implementation details.
Tests to Delete:
workflow_planning_single_tool: This specific case is likely covered by the more generalworkflow_planning_multi_tooltest. Testing a single tool is a subset of testing multiple tools.workflow_planning_input_param_dedup: This concerns a low-level detail of parameter deduplication. Its correctness is implicitly verified by E2E tests that succeed. Testing this in isolation adds little value.workflow_planning_invalid_json: Focuses on the error handling of the JSON parsing library itself. We should assume the underlying library works correctly and focus on how our workflow handles valid or invalid workflow structures, not raw JSON errors.code_generation_strips_code_fences: This is a string manipulation task. While necessary, its correctness is better confirmed within the context of code generation E2E tests where the impact is visible.code_generation_rejects_empty_output: An edge case for code generation output. This is likely covered by broader E2E tests that ensure code generation produces usable output.
Tests to Keep:
These tests focus on the core orchestration and planning logic, including handling multiple tools, potential infeasibility, and error propagation:
workflow_planning_multi_tool: Verifies the core functionality of planning with multiple tools.workflow_planning_infeasible: Checks how the system handles situations where a plan cannot be formed.workflow_planning_handles_llm_failure: Crucial for understanding resilience when the underlying LLM encounters issues.code_generation_rejects_infeasible_plan: Ensures that plans deemed infeasible by the LLM are correctly handled by the code generation step.orchestrate_end_to_end_success: A vital E2E test validating a successful workflow execution from start to finish.orchestrate_bubbles_planning_error: Confirms that errors occurring during the planning phase are correctly propagated and reported.
By retaining these, we ensure our workflow orchestration is tested for its ability to handle complex scenarios, LLM interactions, and error conditions effectively.
Task 1.5: Optimize js_orchestrator_tests.rs (Delete 2, Keep 5)
This involves pruning tests related to the JavaScript orchestrator, keeping those that are most critical for security and core functionality.
Tests to Delete:
memory_limit_is_checked: Testing memory limits can often be complex and is similar in nature to timeout tests. If timeouts are well-tested, memory limits might be implicitly covered or better tested at a different layer.validator_detects_security_issues: This seems to be testing a specific validator's ability to find security issues. While important, the core value lies in the detection and handling of security issues, which might be better captured in broader E2E security tests rather than a specific validator's unit test.
Tests to Keep:
The remaining 5 tests are assumed to cover critical security aspects and core functionalities of the JS orchestrator that are not adequately covered by other means. These are likely focused on ensuring the orchestrator executes JavaScript securely and correctly, handling various inputs and execution contexts without compromising the system.
Task 1.6: Optimize mcp_js_tool_e2e_tests.rs (Delete 1, Keep 4)
We're removing a single test focused on input validation details for JavaScript tools, as this is likely covered by broader E2E tests.
Test to Delete:
test_js_tool_input_validation: The validation of inputs for JavaScript tools is crucial, but this specific unit test might be too granular. If the overall E2E tests for JS tools ensure they process correct inputs and reject incorrect ones gracefully, this specific unit test becomes redundant.
Tests to Keep:
The remaining 4 tests likely cover the core E2E scenarios for JS tools, including successful execution, error handling in execution, interaction with the MCP, and potentially security aspects. These provide more value by validating the tool's behavior in a realistic end-to-end context.
Task 1.7: Optimize roles_tests.rs (Delete 2, Keep 4)
This task focuses on refining tests related to role definitions, removing boundary condition checks and redundant list tests.
Tests to Delete:
role_file_parses_with_description_and_content: This test likely checks the parsing of role files, including optional fields like description and content. If alistcommand or similar E2E test already verifies that roles are loaded correctly with their content, this specific parsing test might be redundant.enforces_role_file_size_limit: This tests a boundary condition – the maximum file size for role definitions. While important, such boundary checks are often implicitly handled by underlying file system operations or are better tested as part of a broader integration test rather than a focused unit test.
Tests to Keep:
The 4 remaining tests are assumed to be security-critical. This could include tests for role execution permissions, preventing privilege escalation, ensuring roles cannot perform unintended actions, and validating the integrity of role execution environments. These are vital for maintaining system security.
Task 1.8: Optimize dynamic_tool_registry.rs (Delete 1, Keep 3)
We're removing a test focused on concurrent registration and unregistration, as this scenario is likely covered by performance or load testing.
Test to Delete:
test_concurrent_register_and_unregister: This test focuses on the behavior of the dynamic tool registry under concurrent operations. While important for understanding concurrency, performance, or load testing scenarios are often better suited for validating such aspects. If the registry is generally stable and these scenarios are unlikely to cause critical failures in normal operation, they can be removed from the standard test suite.
Tests to Keep:
The remaining 3 tests likely cover the core functionality of the dynamic tool registry: successful registration, unregistration, lookup, and perhaps error handling for invalid registrations. These ensure the fundamental operations of the registry are sound.
🆕 Phase 2: Adding Critical P0 End-to-End Tests (5 tests)
Now, we shift gears from deletion to creation. This phase focuses on bolstering our E2E test coverage for the highest priority requirements (P0), ensuring the core functionalities are exceptionally robust.
Task 2.1: Create process_tree_e2e_tests.rs (REQ-001 - P0)
This new file will contain E2E tests specifically designed to validate REQ-001, which focuses on AI CLI process tree tracking. These tests will ensure proper isolation and identification of AI CLI processes within the system's hierarchy.
New E2E Tests (3):
//! Process Tree E2E Tests
//! Tests REQ-001: AI CLI Process Tree Tracking
#[tokio::test]
#[serial]
async fn test_process_tree_isolation_multi_ai_cli() {
// Start multiple AI CLIs (claude, codex, gemini) to simulate concurrent operations.
let claude_task = spawn_ai_cli("claude", "task1").await;
let codex_task = spawn_ai_cli("codex", "task2").await;
let gemini_task = spawn_ai_cli("gemini", "task3").await;
// Verify that the process tree correctly groups these AI CLIs and their children.
let tree = get_process_tree().await;
assert_eq!(tree.groups.len(), 3, "Expected three distinct process groups for the AI CLIs.");
// Crucially, verify memory isolation between different AI CLI processes.
let claude_memory = get_shared_memory(claude_task.pid).await;
let codex_memory = get_shared_memory(codex_task.pid).await;
assert_ne!(claude_memory.namespace, codex_memory.namespace, "Memory namespaces should be isolated between different AI CLIs.");
}
#[tokio::test]
async fn test_root_ai_cli_detection() {
// Simulate a deep process tree structure (e.g., 5 levels deep) to test root detection.
// Example: parent -> child1 -> child2 -> child3 -> child4 -> child5
// where 'parent' is identified as a specific AI CLI (e.g., codex).
// Assert that the system can correctly identify the root AI CLI process even in a deep hierarchy.
let root = detect_root_ai_cli(child5_pid).await.expect("Should detect the root AI CLI.");
assert_eq!(root.ai_type, AiCliType::Codex, "The detected root AI CLI should be Codex.");
assert_eq!(root.pid, parent_pid, "The PID of the detected root should match the parent process.");
}
#[tokio::test]
async fn test_cross_platform_process_detection() {
// Ensure process detection works correctly on different operating systems.
#[cfg(target_os = "windows")]
{
// Verify Windows-specific API calls for process information retrieval.
let process = get_process_info_windows(pid).await.unwrap();
assert!(process.parent_pid > 0, "Windows process detection failed to retrieve parent PID.");
}
#[cfg(not(target_os = "windows"))]
{
// Verify Unix-specific methods (e.g., reading from /proc).
let process = get_process_info_unix(pid).await.unwrap();
assert!(process.parent_pid > 0, "Unix process detection failed to retrieve parent PID.");
}
}
Dependencies: These tests will require the implementation of helper functions such as spawn_ai_cli (to start AI CLI processes), get_process_tree (to inspect the process hierarchy), and detect_root_ai_cli (to find the top-level AI process). These helpers are crucial for simulating and verifying the process tree behavior.
Task 2.2: Create provider_injection_e2e_tests.rs (Supplement REQ-002 - P0)
This new test file will focus on E2E scenarios for REQ-002, specifically addressing the injection of environment variables from third-party providers into AI CLI processes. This ensures that configurations are correctly applied.
New Integration Tests (2):
//! Provider Injection E2E Tests
//! Tests REQ-002: Third-Party Provider Management (Environment Variable Injection)
#[tokio::test]
#[serial]
async fn test_provider_env_injection_to_ai_cli() {
// 1. Configure a third-party provider (e.g., openrouter) with custom environment variables.
let config = ProvidersConfig {
default_provider: "openrouter".into(),
providers: HashMap::from([(
"openrouter".into(),
Provider {
token: Some("sk-test-123".into()),
base_url: Some("https://openrouter.ai/api/v1".into()),
scenario: None,
env: HashMap::from([
("CUSTOM_KEY".into(), "custom_value".into()),
]),
},
)]),
};
config.save().await.unwrap(); // Persist the configuration.
// 2. Start an AI CLI, explicitly specifying the configured provider.
let task = start_task(StartTaskParams {
ai_type: "codex".into(),
task: "test task".into(),
provider: Some("openrouter".into()),
role: None,
}).await.unwrap();
// 3. Verify that the environment variables (including custom ones) are correctly injected into the AI CLI process.
let env_vars = get_process_env_vars(task.pid).await.unwrap();
assert_eq!(env_vars.get("ANTHROPIC_API_KEY"), Some(&"sk-test-123".to_string()));
assert_eq!(env_vars.get("ANTHROPIC_BASE_URL"), Some(&"https://openrouter.ai/api/v1".to_string()));
assert_eq!(env_vars.get("CUSTOM_KEY"), Some(&"custom_value".to_string()), "Custom environment variable was not injected.");
// Clean up by stopping the task.
stop_task(StopTaskParams { pid: task.pid }).await.unwrap();
}
#[tokio::test]
async fn test_provider_compatibility_validation() {
// Configure a provider that is intentionally incompatible with a specific AI CLI (e.g., 'codex').
let config = ProvidersConfig {
default_provider: "incompatible".into(),
providers: HashMap::from([(
"incompatible".into(),
Provider {
token: Some("invalid".into()),
base_url: None,
scenario: Some("Only for claude".into()), // Explicitly state incompatibility.
env: HashMap::new(),
},
)]),
};
config.save().await.unwrap();
// Attempt to start an AI CLI task using the incompatible provider.
let result = start_task(StartTaskParams {
ai_type: "codex".into(), // This AI CLI is not compatible with the provider.
task: "test".into(),
provider: Some("incompatible".into()),
role: None,
}).await;
// Assert that the startup check correctly fails due to incompatibility.
assert!(result.is_err(), "Starting task with an incompatible provider should fail.");
let err_msg = result.unwrap_err().to_string();
assert!(err_msg.contains("incompatible") || err_msg.contains("not supported"), "Error message should indicate incompatibility.");
}
Dependencies: These tests rely on a helper function get_process_env_vars to retrieve the environment variables of a running process, which is essential for verification.
🆕 Phase 3: Adding Critical P1 End-to-End Tests (8 tests)
This phase introduces E2E tests for P1 (Priority 1) requirements, focusing on user interaction, data persistence, and system updates. These tests ensure a smooth and reliable user experience.
Task 3.1: Create interactive_mode_e2e_tests.rs (REQ-009 - P1)
This new file will contain E2E tests for REQ-009, focusing on the interactive mode of the AI CLI. It will verify starting the CLI, handling signals, and ensuring proper resource cleanup.
New E2E Tests (2):
//! Interactive Mode E2E Tests
//! Tests REQ-009: Interactive AI CLI Startup
#[tokio::test]
#[serial]
async fn test_interactive_mode_with_provider() {
// 1. Set up a test provider configuration to ensure it's available.
setup_test_provider("openrouter").await;
// 2. Launch the AI CLI in interactive mode, specifying a provider.
let mut child = Command::new("agentic-warden")
.args(&["claude", "-p", "openrouter"])
.stdin(Stdio::piped())
.stdout(Stdio::piped())
.spawn()
.unwrap();
// 3. Short delay to allow the process to start.
tokio::time::sleep(Duration::from_secs(1)).await;
assert!(child.try_wait().unwrap().is_none(), "Process should be running and not exited.");
// 4. Verify that environment variables from the configured provider are injected.
let pid = child.id() as i32;
let env = get_process_env_vars(pid).await.unwrap();
assert!(env.contains_key("ANTHROPIC_API_KEY"), "Provider API key should be injected.");
// 5. Simulate graceful exit using Ctrl+D (EOF on stdin).
drop(child.stdin.take()); // Closing stdin simulates EOF.
let status = child.wait().await.unwrap();
assert!(status.success(), "Process should exit cleanly.");
// 6. Verify that the task is marked as completed after exit.
tokio::time::sleep(Duration::from_millis(500)).await; // Allow time for status update.
let tasks = list_tasks().await.unwrap();
let task = tasks.iter().find(|t| t.pid == pid);
// Task might be removed or marked as completed.
assert!(task.is_none() || task.unwrap().status == TaskStatus::Completed, "Task status should be Completed after graceful exit.");
}
#[tokio::test]
#[serial]
async fn test_interactive_signal_handling() {
// Start the AI CLI in interactive mode.
let mut child = Command::new("agentic-warden")
.args(&["codex"])
.spawn()
.unwrap();
let pid = child.id() as i32;
tokio::time::sleep(Duration::from_millis(500)).await;
// Send SIGINT (Ctrl+C) to the process.
#[cfg(unix)]
{
use nix::sys::signal::{kill, Signal};
use nix::unistd::Pid;
kill(Pid::from_raw(pid), Signal::SIGINT).unwrap();
}
#[cfg(windows)]
{
// Use taskkill for Windows equivalent.
Command::new("taskkill")
.args(&["/PID", &pid.to_string(), "/T"])
.output()
.unwrap();
}
// Wait for the process to terminate, with a timeout to prevent hanging.
let status = tokio::time::timeout(Duration::from_secs(5), child.wait())
.await
.expect("Process should exit within 5s")
.unwrap();
// Verify that the process has been cleaned up.
assert!(!platform::process_alive(pid), "Process should no longer be alive after termination.");
}
Task 3.2: Create conversation_history_e2e_tests.rs (REQ-010 - P1)
This new file introduces E2E tests for REQ-010, focusing on the integration of conversation history for Claude Code. It will verify transcript processing, TODO extraction, and searching historical data.
New E2E Tests (3):
//! Conversation History E2E Tests
//! Tests REQ-010: Claude Code Conversation History Integration
#[tokio::test]
async fn test_claude_code_hook_integration() {
// 1. Prepare a sample transcript file for a test session.
let session_id = "test-session-123";
let transcript_path = create_test_transcript(session_id, vec![
("user", "Help me implement auth"),
("assistant", "I'll help you...\n- [ ] Create login endpoint\n- [ ] Add JWT validation"),
]).await;
// Construct the input payload for the hooks handler, simulating a 'SessionEnd' event.
let hook_input = json!({
"session_id": session_id,
"transcript_path": transcript_path.to_str().unwrap(),
"hook_event_name": "SessionEnd",
"cwd": "/tmp",
"permission_mode": "normal"
});
// 2. Execute the `agentic-warden hooks handle` command with the prepared input.
let mut child = Command::new("agentic-warden")
.args(&["hooks", "handle"])
.stdin(Stdio::piped())
.stdout(Stdio::piped())
.spawn()
.unwrap();
let mut stdin = child.stdin.take().unwrap();
stdin.write_all(hook_input.to_string().as_bytes()).await.unwrap();
drop(stdin);
let output = child.wait_with_output().await.unwrap();
assert_eq!(output.status.code(), Some(0), "Hook handler should execute successfully.");
// 3. Verify that the conversation history is stored and searchable in SahomeDB.
let history_store = ConversationHistoryStore::new(
&PathBuf::from("~/.config/agentic-warden/conversation_history.db"),
384
).unwrap();
let results = history_store.search("implement auth", Some(session_id), 10).await.unwrap();
assert!(!results.is_empty(), "Search should return relevant history entries.");
// 4. Validate that TODO items are correctly extracted and stored within the metadata.
let record = &results[0];
let todos: Vec<String> = serde_json::from_str(record.metadata.get("todos").unwrap()).unwrap();
assert_eq!(todos.len(), 2, "Two TODO items should be extracted.");
assert!(todos[0].contains("Create login endpoint"));
}
#[tokio::test]
async fn test_todo_extraction_from_transcript() {
// Create a transcript containing explicit TODO items and action items.
let transcript = create_test_transcript("session-456", vec![
("user", "Create API"),
("assistant", "TODO: Design schema\nAction Items:\n- Implement endpoints\n- Write tests"),
]).await;
// Process the transcript via the hook handler.
process_hook_stdin(json!({
"session_id": "session-456",
"transcript_path": transcript.to_str().unwrap(),
"hook_event_name": "SessionEnd",
})).await.unwrap();
// Retrieve and verify the extracted TODOs from the history store.
let store = ConversationHistoryStore::new(
&PathBuf::from("~/.config/agentic-warden/conversation_history.db"),
384
).unwrap();
let results = store.search("API", Some("session-456"), 10).await.unwrap();
let todos: Vec<String> = serde_json::from_str(
results[0].metadata.get("todos").unwrap()
).unwrap();
assert!(todos.iter().any(|t| t.contains("Design schema")));
assert!(todos.iter().any(|t| t.contains("Implement endpoints")));
assert!(todos.iter().any(|t| t.contains("Write tests")));
}
#[tokio::test]
async fn test_search_history_mcp_tool() {
// 1. Populate the history store with sample conversations.
setup_test_conversations(vec![
("session-1", "Implement authentication with JWT"),
("session-2", "Create user registration API"),
("session-3", "Fix database connection pooling"),
]).await;
// 2. Use the `search_history` MCP tool to query for relevant conversations.
let result = search_history(SearchHistoryParams {
query: "authentication JWT".into(),
session_id: None,
limit: Some(5),
}).await.unwrap();
// 3. Validate that the search returns semantically relevant results.
assert!(!result.results.is_empty());
assert!(result.results[0].content.contains("authentication"));
// 4. Ensure that TODO items, if present in the matched history entries, are also returned.
if let Some(todos) = &result.results[0].todos {
assert!(!todos.is_empty(), "TODO items should be returned if available in search results.");
}
}
Task 3.3: Create ai_cli_update_e2e_tests.rs (REQ-011 - P1)
This file introduces E2E tests for REQ-011, focusing on the management of AI CLI updates and installations. It will cover updating via package managers like npm and handling native CLI updates.
New Integration Tests (3):
//! AI CLI Update E2E Tests
//! Tests REQ-011: AI CLI Update/Installation Management
use mockito::{mock, server_url};
#[tokio::test]
async fn test_update_npm_package() {
// 1. Mock the npm registry API to return a specific version for a package.
let _mock = mock("GET", "/openai/codex/latest")
.with_status(200)
.with_header("content-type", "application/json")
.with_body(r#"{"version": "2.0.0"}"#)
.create();
// 2. Create a mock npm binary to intercept and simulate the npm install command.
let npm_bin = create_mock_npm_bin().await;
let _guard = EnvGuard::set("PATH", npm_bin.parent().unwrap().to_str().unwrap()); // Add mock dir to PATH.
// 3. Execute the `agentic-warden update codex` command.
let output = Command::new("agentic-warden")
.args(&["update", "codex"])
.env("NPM_REGISTRY", &server_url()) // Point to our mock registry.
.output()
.await
.unwrap();
assert!(output.status.success(), "Update command should succeed.");
let stdout = String::from_utf8(output.stdout).unwrap();
// Verify that the output indicates the correct version or successful update.
assert!(stdout.contains("2.0.0") || stdout.contains("Updated"));
}
#[tokio::test]
async fn test_update_claude_native() {
// 1. Mock the 'claude' executable to simulate its behavior.
let claude_bin = create_mock_claude_bin().await;
let _guard = EnvGuard::set("PATH", claude_bin.parent().unwrap().to_str().unwrap());
// 2. Execute the `agentic-warden update claude` command.
let output = Command::new("agentic-warden")
.args(&["update", "claude"])
.output()
.await
.unwrap();
assert!(output.status.success(), "Native update command should succeed.");
// 3. Check the mock log to ensure the correct update command was invoked.
let mock_log = read_mock_claude_log().await;
assert!(mock_log.contains("update"), "The mock 'claude' binary should have been called with an 'update' argument.");
}
#[tokio::test]
async fn test_update_network_error_handling() {
// Simulate a network failure by returning a 500 status code from the registry mock.
let _mock = mock("GET", "/openai/codex/latest")
.with_status(500)
.create();
// Execute the update command against the failing mock server.
let result = Command::new("agentic-warden")
.args(&["update", "codex"])
.env("NPM_REGISTRY", &server_url())
.output()
.await
.unwrap();
// 5. Assert that the update command fails due to the network error.
assert!(!result.status.success(), "Update command should fail on network error.");
// 6. Verify that the error message presented to the user is informative.
let stderr = String::from_utf8(result.stderr).unwrap();
assert!(stderr.contains("network") || stderr.contains("failed") || stderr.contains("registry"), "User-friendly error message for network issues is expected.");
}
🆕 Phase 4: Adding Smart Routing Integration Tests (2 tests)
This phase introduces integration tests for smart routing capabilities, specifically focusing on the integration with Claude Code (REQ-012). These tests ensure the routing system correctly identifies, registers, and executes tools dynamically.
Task 4.1: Create mcp_intelligent_route_claude_code_e2e.rs (REQ-012 Supplement)
This new file will contain E2E tests demonstrating the dynamic tool registration and execution flow orchestrated by the intelligent routing system when interacting with Claude Code.
New Claude Code Integration Tests (2):
//! MCP Intelligent Route - Claude Code Integration E2E Tests
//! Tests REQ-012: Intelligent MCP Routing System (Claude Code Integration)
#[tokio::test]
async fn test_list_tools_dynamic_refresh() {
// 1. Initialize and start the MCP server for testing.
let mcp_server = start_mcp_server().await;
// 2. Initially list available tools. Expecting base tools like 'intelligent_route' and 'search_history'.
let initial_tools = mcp_server.list_tools().await.unwrap();
assert_eq!(initial_tools.len(), 2, "Should initially list only base tools.");
// 3. Call the 'intelligent_route' tool with a request that should trigger dynamic tool registration.
let route_result = mcp_server.call_tool("intelligent_route", json!({
"user_request": "Check git status",
"execution_mode": "dynamic"
})).await.unwrap();
// Verify that a new tool (e.g., for git status) was indeed registered dynamically.
assert!(route_result["dynamically_registered"].as_bool().unwrap(), "Intelligent route should indicate dynamic registration.");
// 4. List tools again and assert that the newly registered tool is now available.
let updated_tools = mcp_server.list_tools().await.unwrap();
assert!(updated_tools.len() > initial_tools.len(), "Tool list should increase after dynamic registration.");
assert!(updated_tools.iter().any(|t| t.name.contains("git")), "Newly registered git tool should be present.");
// 5. Verify that the tool refresh is fast (e.g., completes within 1 second).
let refresh_start = Instant::now();
let _ = mcp_server.list_tools().await.unwrap(); // Perform another list operation.
assert!(refresh_start.elapsed() < Duration::from_secs(1), "Tool list refresh should be fast.");
}
#[tokio::test]
async fn test_orchestrated_tool_execution_via_claude_code() {
// Initialize the MCP server.
let mcp_server = start_mcp_server().await;
// 1. Use 'intelligent_route' to dynamically generate an orchestrated tool based on a user request.
let route_result = mcp_server.call_tool("intelligent_route", json!({
"user_request": "Create a git report with status and commit summary",
"execution_mode": "dynamic"
})).await.unwrap();
let tool_name = route_result["selected_tool"]["tool_name"].as_str().unwrap();
// 2. Retrieve the schema for the dynamically generated tool.
let tools = mcp_server.list_tools().await.unwrap();
let orchestrated_tool = tools.iter().find(|t| t.name == tool_name).expect("Dynamically generated tool not found.");
assert!(orchestrated_tool.description.is_some(), "Dynamically generated tool should have a description.");
assert!(!orchestrated_tool.input_schema.is_empty(), "Dynamically generated tool should have an input schema.");
// 3. Execute the generated orchestrated tool with appropriate input.
let execution_result = mcp_server.call_tool(tool_name, json!({
"repo_path": "/tmp/test-repo" // Example input.
})).await.unwrap();
// 4. Validate that the tool execution completed successfully.
assert!(execution_result["ok"].as_bool().unwrap_or(false), "Orchestrated tool execution should report success.");
}
✅ Acceptance Criteria
Phase 1: Deletion Acceptance
- [x] 45 redundant tests have been successfully deleted.
- [x]
cargo testcommand executes without any failures. - [x] All remaining tests pass successfully.
- [x] The total number of tests is reduced to approximately 100.
Phase 2: P0 Test Acceptance
- [ ] All 3 tests in
process_tree_e2e_tests.rspass successfully. - [ ] Both tests in
provider_injection_e2e_tests.rspass successfully. - [ ] Coverage for REQ-001 reaches 90% or higher.
- [ ] Coverage for REQ-002 reaches 90% or higher.
Phase 3: P1 Test Acceptance
- [ ] Both tests in
interactive_mode_e2e_tests.rspass successfully. - [ ] All 3 tests in
conversation_history_e2e_tests.rspass successfully. - [ ] All 3 tests in
ai_cli_update_e2e_tests.rspass successfully. - [ ] Coverage for REQ-009, REQ-010, and REQ-011 reaches 80% or higher.
Phase 4: Smart Routing Acceptance
- [ ] Both tests in
mcp_intelligent_route_claude_code_e2e.rspass successfully. - [ ] Coverage for REQ-012 reaches 95% or higher.
Final Acceptance
- [ ] Total Tests: 115 (145 - 45 deletions + 15 additions).
- [ ] Overall Coverage: 95% or greater.
- [ ] P0 Requirement Coverage: All P0 requirements covered at > 90%.
- [ ] P1 Requirement Coverage: All P1 requirements covered at > 80%.
- [ ] CI/CD Status: All checks in the Continuous Integration and Continuous Deployment pipeline pass.
📦 Technical Constraints
New Test Dependencies
To support the new tests, the following dependencies need to be added or updated in Cargo.toml:
[dev-dependencies]
mockito = "1.2" # For mocking HTTP requests in tests
nix = "0.27" # For Unix signal handling (Unix-only)
serial_test = "3.0" # Required for running tests serially (already present)
tokio = { version = "1.35", features = ["full"] } # Async runtime (already present)
Test Helper Functions
Several new helper functions will need to be implemented in tests/common/mod.rs to support the new E2E tests. These functions abstract common testing operations:
spawn_ai_cli(ai_type, task) -> TaskInfo: Spawns a new AI CLI process for testing.get_process_tree() -> ProcessTree: Retrieves the current process tree structure.detect_root_ai_cli(pid) -> RootAiCli: Identifies the root AI CLI process from a given PID.get_process_env_vars(pid) -> HashMap<String, String>: Fetches the environment variables of a specified process.create_test_transcript(session_id, messages) -> PathBuf: Creates a temporary transcript file for testing conversation history.setup_test_provider(name) -> (): Configures a specific provider for testing purposes.create_mock_npm_bin() -> PathBuf: Creates a mock executable for the npm command.create_mock_claude_bin() -> PathBuf: Creates a mock executable for the claude command.start_mcp_server() -> McpServerTestHarness: Starts a test harness for the MCP server.
🔗 Associated Specifications
This test optimization effort directly addresses the following requirements:
- REQ-001: AI CLI Process Tree Tracking (P0)
- REQ-002: Third-Party Provider Management (P0)
- REQ-009: Interactive AI CLI Startup (P1)
- REQ-010: Claude Code Conversation History Integration (P1)
- REQ-011: AI CLI Update/Installation Management (P1)
- REQ-012: Intelligent MCP Routing System (P0)
📅 Estimated Effort
The total estimated effort for this test optimization plan is broken down as follows:
- Phase 1 (Deletion): 2 hours (Focus on quick removal of redundant tests).
- Phase 2 (P0 Tests): 6 hours (Developing critical E2E tests for high-priority requirements).
- Phase 3 (P1 Tests): 8 hours (Implementing E2E tests for important user-facing features).
- Phase 4 (Routing Integration): 2 hours (Adding integration tests for the smart routing system).
- Total Estimated Time: 18 hours.
🎯 Execution Order
The following sequence is recommended for executing this plan:
- Phase 1 (Deletion): Execute immediately to reduce maintenance burden and simplify the test suite.
- Phase 2 (P0 Tests): Prioritize these tests as they cover the most critical functionalities.
- Phase 4 (Routing Integration): Implement these next, leveraging existing infrastructure where possible.
- Phase 3 (P1 Tests): Complete the remaining P1 tests, ensuring all key requirements are addressed.
By following this structured approach, we can efficiently optimize our test suite, enhance code quality, and ensure the reliability of our core features. For more information on best practices in test automation and E2E testing strategies, you can refer to resources from BrowserStack.