Big questions

For whom are you building the software? Whose opinions about it matter?

Insight into the state of our CI cluster.

It should collect information about the CI cluters and present it in a degestable form.

Three main components:

Data collection. Collect information about CI from various sources and store it in a central location
- Machines
- GitLab API
- GitLab CI logs
- CDash
Dashboards. An up-to-date view of the system's status. Three main viewpoints:
- Runner-focused
- Project-focused
- Schedule-focused (status of scheduled jobs)
Analyze realtime status
- Classify failed jobs based on the cause (e.g., machine problem, code problem, network issues, etc.)
- For transient/machine problems, report, possibly mitigate if confidence is high enough (reboot machine, restart job)

Dashboards are critical. Analysis and response are nice-to-have.

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information