Skip to main content

Architecture

This page replaces the former root-level ARCHITECTURE.md file.

System Overview

The CLI is split into five subsystems:
  • command parsing and validation
  • auth and host resolution
  • native search execution
  • enhanced discovery and reranking
  • rendering and error reporting

Request Pipeline

  1. Resolve effective host.
  2. Resolve host-scoped PAT.
  3. Parse the query and structured flags.
  4. Validate conflicts against raw query qualifiers.
  5. Build one GitHub repository-search REST request.
  6. Fetch one result set.
  7. Apply post-filters that are not natively expressible, such as updated-*.
  8. Render in the selected output format.
  1. Resolve effective host and PAT.
  2. Parse and validate flags.
  3. Build the seed search.
  4. Collect candidates using the staged fan-out plan for the selected depth.
  5. Dedupe on full_name.
  6. Enrich candidates with repo metadata needed for activity and quality scoring.
  7. Optionally enrich the top README window if --readme is enabled.
  8. Compute the selected rank mode.
  9. Apply limit.
  10. Render output, optionally with --explain.

Inspect

  1. Resolve effective host and PAT.
  2. Fetch repo metadata by explicit owner/repo.
  3. Optionally fetch README if --readme is enabled.
  4. Render details.

Storage Model

Secrets

  • PATs are stored per host in the OS credential store
  • env vars override stored credentials
  • insecure fallback storage is opt-in only

Config

  • store non-secret settings in a per-user config file
  • config never stores behavioral search defaults that would silently enable enhanced behavior

Cache / Index

  • v1 uses per-command ephemeral in-memory state only
  • no persistent repo metadata DB survives across commands
  • no persistent discovery index in default behavior

Scoring Model

Normalization

Each score component is normalized to a stable 0..1 range before ranking.

Component Use

  • native rank: preserve retrieval order
  • query rank: query score only
  • activity rank: activity score only
  • quality rank: quality score only
  • blended rank: weighted average of query, activity, and quality

Explainability

Every enhanced score must be explainable from explicit inputs and repo facts.
  • no learned personalization
  • no history-based bias

Progress And Concurrency

Progress

  • human progress output uses stderr
  • structured stdout remains clean
  • progress is phase-based:
    • searching
    • collecting
    • enriching metadata
    • enriching README
    • ranking

Concurrency

  • official GitHub guidance favors serial requests to avoid secondary rate limits
  • default concurrency is 1
  • --concurrency is advanced and explicit
  • on secondary-rate-limit responses, the client must:
    • respect retry-after
    • respect x-ratelimit-reset
    • otherwise back off exponentially with jitter

Host Handling

Normalization

Accept:
  • github.com
  • github.example.com
  • https://github.example.com
  • https://github.example.com/api/v3
Normalize into:
  • canonical web host
  • canonical REST API base URL
Defaults:
  • GitHub.com web host -> github.com
  • GitHub.com REST base -> https://api.github.com
  • GHES REST base -> https://<host>/api/v3

Support Promise

  • GitHub.com is the primary target
  • custom hosts are best-effort in v1
  • the CLI should always send a stable API version header where supported

Validation And Conflict Policy

First-Error Policy

  • validate flags before any network work where possible
  • return the first clear error only

Examples Of Enforced Conflicts

  • raw qualifier plus overlapping structured flag
  • --created-after plus --created-within
  • --weight-query without --rank blended
  • --concurrency without explicit discovery or enrichment mode

Output Contracts

Success

  • pretty is human-first and concise
  • json is stable and structured
  • compact is minified JSON
  • csv is flat export-friendly output

Errors

  • stderr only
  • symbolic error code prefix
  • plain text message
  • shell exit code 1