Skip to content

Lessons Learned

⚠️ Engineering Notes – Non-Normative Document

This document collects lessons learned during development. It reflects reasoning, mistakes, and decisions observed over time.

It is not a specification, not a contract, and not a source of truth for system behavior.

Its purpose is to:

  • preserve engineering context
  • document recurring pitfalls
  • support future design decisions
  • reduce cognitive load when revisiting past work

The contents may evolve, be refined, or become obsolete as the project evolves.

Table of Contents

Scope

This document captures generalized, transferable lessons derived from real development and debugging sessions.

All project-specific details have been removed. The focus is on engineering practice, not on a specific codebase.


1. Process & Decision Making


L0001 – Context must be explicit

Implicit assumptions cause more failures than incorrect code.

Clear definition of:

  • scope
  • constraints
  • goals
  • non-goals

is required before any technical work begins.


L0002 – Decisions must precede implementation

Writing code before documenting decisions leads to:

  • scope creep
  • accidental redesigns
  • inconsistent behavior

A written decision acts as a constraint, not bureaucracy.


L0003 – Structure must exist before automation

Automation applied to an undefined structure creates fragile systems.

Correct order:

  • define structure
  • validate assumptions
  • document decisions
  • automate

L0004 – Small steps outperform large refactors

Incremental changes:

  • are easier to validate
  • reduce regression risk
  • preserve intent

Large refactors without checkpoints increase uncertainty.


2. Documentation as a System Component


L0010 – Documentation is part of the architecture

Documentation defines:

  • intent
  • boundaries
  • invariants
  • evolution constraints

If documentation diverges from behavior, the system becomes unstable.


L0011 – Decisions require lifecycle management

Decision records must be:

  • versioned
  • append-only
  • individually addressable
  • traceable over time

Single monolithic documents do not scale.


L0012 – Documentation must match reality

If behavior changes, documentation must change in the same commit.

Outdated documentation is a form of technical debt.


3. Tooling & Automation


L0020 – Tooling is production code

Developer tooling must be:

  • versioned
  • reviewed
  • deterministic
  • documented

Temporary scripts almost never remain temporary.


L0021 – Automation must be explicit

Implicit automation leads to:

  • hidden dependencies
  • irreproducible builds
  • fragile workflows

Automation must declare:

  • inputs
  • outputs
  • failure modes

L0022 – CI is an architectural validator

CI failures often indicate:

  • design inconsistencies
  • missing contracts
  • undocumented assumptions

CI is feedback on architecture, not just correctness.


4. API, CLI, and Interface Design


L0030 – Contracts must be explicit

If behavior is not defined, each component invents its own rules.

This applies to:

  • CLI flags
  • return codes
  • logging behavior
  • configuration semantics

L0031 – CLI output has semantic layers

Output must be classified as:

  • user-facing output
  • diagnostic information
  • debug/trace data
  • errors

Mixing these layers makes behavior untestable.


L0032 – Quiet and verbose modes must be strict

Rules:

  • quiet → no output except errors
  • verbose → additive only
  • no behavioral change based on verbosity

Anything else creates ambiguity.


L0033 – CLI logic must be testable

CLI implementations should:

  • separate parsing from execution
  • avoid sys.exit() in business logic
  • return structured results

Testability must be designed, not patched later.


L0034 – CLI behavior must follow platform conventions

CLI behavior must follow established platform conventions, including:

  • separation of stdout and stderr
  • exit code semantics
  • argument parsing rules
  • error reporting conventions

Violating these expectations leads to fragile tests and user confusion.

Flags such as --quiet must only affect informational output and must not suppress errors or diagnostics emitted by the runtime or argument parser.


5. Architecture & Code Structure


L0040 – Separation of concerns must be enforced

Logic, orchestration, I/O, and diagnostics must not be mixed.

Each layer must have a single responsibility.


L0041 – Registries are catalogs, not execution plans

A registry describes what exists.

Execution must be:

  • explicit
  • filtered
  • intentional

Never assume that everything registered must run.


L0042 – Schema evolution must be explicit

Schemas must:

  • declare versions
  • evolve intentionally
  • never infer structure implicitly

Schema versioning is part of API stability.


6. Testing & Validation


L0050 – Tests must validate intent, not implementation

Good tests:

  • verify observable behavior
  • assert contracts
  • survive refactors

Bad tests:

  • mirror internal structure
  • depend on private state
  • break on redesign

L0051 – Coverage is a signal, not a goal

High coverage does not imply correctness.

Coverage must be interpreted:

  • per module
  • by responsibility
  • in context

L0052 – Integration paths should not be unit-tested aggressively

I/O-heavy or environment-dependent code should be validated via:

  • integration checks
  • preflight validation
  • controlled manual verification

Not everything should be unit-tested.


L0053 – Tests expose architectural flaws early

Many design issues emerge only when writing tests.

Tests act as a design feedback mechanism.


L0054 – Tests must not depend on environment state

Tests depending on:

  • filesystem layout
  • installed tools
  • OS-specific behavior

are not reliable unit tests.


L0055 – Failing tests are more valuable than passing ones

A failing test often reveals:

  • incorrect assumptions
  • undocumented behavior
  • mismatched expectations
  • hidden coupling between components

Test failures should trigger a review of intent before any code change. They frequently expose design issues rather than implementation bugs.


7. Logging & Observability


L0060 – Logging must be designed, not improvised

Logging should be:

  • structured
  • centralized
  • predictable

Ad-hoc logging creates noise and instability.


L0061 – Log levels must have strict meaning

Each level must correspond to a specific intent.

If TRACE exists, it must be explicitly enabled.


L0062 – Observability beats guesswork

Systems should expose:

  • decision points
  • intermediate state
  • failure context

Debugging without visibility leads to speculation.


L0063 – Debugging requires control, not intuition

Effective debugging requires:

  • deterministic reproduction
  • explicit breakpoints
  • controlled execution flow
  • observable state transitions

Guesswork and ad-hoc logging are poor substitutes for structured inspection. Debugging tools should support reasoning, not replace it.


8. Versioning & Release Discipline


L0070 – Version must have a single source of truth

Version information must come from:

  • tags
  • release metadata

Never from duplicated constants.


L0071 – CI is authoritative for releases

Local environments must not control:

  • version numbers
  • releases
  • publication logic

L0072 – Security rules must be documented

Security behavior must be:

  • explicit
  • documented
  • enforced by tooling

Implicit security assumptions always fail over time.


9. Process-Level Principles


L0080 – Assumptions are liabilities

Every assumption must be:

  • stated
  • verified
  • documented

Unstated assumptions accumulate risk.


L0081 – Not fixing something can be correct

Fixing the wrong thing causes more damage than leaving it unchanged.

Restraint is an engineering skill.


L0082 – Decisions reduce cognitive load

Writing decisions:

  • prevents repeated debates
  • preserves rationale
  • accelerates future work

10. Core Principles


L0090 – Structure before automation


L0091 – Documentation before tooling


L0092 – Decisions before code

These principles prevent most long-term failures.


End of Document