Admin Playbooks

These playbooks are practical, day-to-day operating procedures that apply the Observe → Recommend → Act model to common IBM i administration tasks. Each playbook is written for beginners and experienced admins alike. They emphasize safety, verification, and learning while doing.

Use them as checklists during your shift. Every step references the exact tools you can ask the AI to run. Cross-links to the Glossary and Tools Reference are included so you can deepen your understanding on the fly.


1. Daily Health Check

Purpose: Establish a baseline of system health in under 5 minutes.
Time: 3–5 minutes
Risk level: Observe only — zero risk.

Beginner Context

A daily health check catches storage pressure, runaway jobs, and unanswered operator messages before they become incidents. The AI can run the entire sweep with one prompt and explain the results in plain language.

Steps

  1. Run the system health check
    Ask: Run a system health check and summarize the biggest risks.
    This calls inspect_system (view: "health") and returns storage, PTF, job, and security posture in one response.

  2. Review storage utilization
    Ask: Check storage usage summary and tell me whether capacity looks healthy.
    Focus on ASP utilization percentage and top consumers via inspect_storage (view: "summary") and (view: "consumers"). (See ASP and Library in the glossary).

  3. Inspect active jobs
    Ask: Check active jobs and show the top CPU jobs with subsystem names.
    Look for jobs in MSGW or LCKW status and note their subsystem via inspect_jobs (view: "active"). (See Job, Subsystem).

  4. Triage operator messages
    Ask: Check QSYSOPR and tell me whether any messages still require attention.
    Unanswered inquiry messages (type *INQ) are the highest priority. This calls inspect_messages (view: "qsysopr"). (See QSYSOPR).

  5. Check recent system history
    Ask: Check recent system history log messages and call out high-severity items.
    This calls inspect_messages (view: "history_log") and surfaces hardware or software faults that may not yet have raised an operator message.

What Success Looks Like

  • All ASPs < 80% full.
  • No critical jobs in MSGW for > 5 minutes.
  • QSYSOPR has zero unanswered *INQ messages older than 15 minutes.
  • No PTF or security exposure flagged in the health summary.

Teach Moments & Common Pitfalls

  • Pitfall: Ignoring QSYSOPR because “nothing looks urgent.” A single unanswered inquiry can pause an entire night batch stream.
  • Teach moment: High CPU in QBATCH often means a long-running report or a loop; use inspect_messages (view: "job_log") on the specific job number to see the last 20 messages before deciding to end it.
  • Safety note: This playbook is 100% Observe. You can run it on production at any time.

Related tools: inspect_system, inspect_storage, inspect_jobs, inspect_messages


2. Triage an Issue

Purpose: Move from “something feels wrong” to a confirmed root cause and a safe remediation plan.
Time: 10–20 minutes
Risk level: Observe first, then Recommend, then Act only after approval.

Beginner Context

Most incidents start with a vague symptom: “the system is slow,” “reports aren’t printing,” or “users can’t sign on.” This playbook forces you to observe broadly, then narrow, then plan the fix.

Steps

  1. Broad observation
    Ask: Run a system health check and summarize the biggest risks.
    Note any red flags in storage, jobs, or messages via inspect_system (view: "health").

  2. Narrow to the symptom

    • Slow interactive? → inspect_jobs (view: "active")
    • Reports stuck? → inspect_spool (view: "output_queue_entries")
    • Users locked out? → inspect_security (view: "login_activity")
  3. Get the job log (if a specific job is involved)
    Ask: Get the job log for job 123456/QUSER/QPADEV0001 and summarize the last 20 messages.
    This calls inspect_messages (view: "job_log"). Look for CPF or MCH errors, authority failures, or record-lock messages.

  4. Form a hypothesis and create a plan
    Once you have evidence, ask the AI to plan the fix:
    Plan ending job 123456/QUSER/QPADEV0001 with a controlled end and explain the risk.
    This calls manage_job (operation: "end", mode: "preview"). Review the generated ENDJOB command, the objects it will affect, and the previewId / previewToken it creates.

  5. Approve and act
    After human review in the Hub interface (generating an approvalTicket), execute the approved change by invoking manage_job (operation: "end", mode: "execute").

What Success Looks Like

  • You can state the root cause in one sentence backed by live data.
  • A Change Plan exists and has been reviewed before any mutation.
  • The job log or message queue entry that triggered the issue is captured in your investigation notes.

Teach Moments & Common Pitfalls

  • Pitfall: Ending a job without reading its log first. The job may be waiting on a lock or a missing file; killing it can make recovery harder.
  • Teach moment: Many “slow system” tickets are actually one long-running batch job monopolizing CPU. The inspect_jobs (view: "active") tool surfaces this in seconds.
  • Safety note: Never run a state change without a preceding mode: "preview" step. The approval gate exists to protect you and the system.

Related tools: inspect_system, inspect_jobs, inspect_spool, inspect_security, inspect_messages, manage_job


3. User Provisioning

Purpose: Create or modify user profiles following least-privilege and audit-friendly patterns.
Time: 5–10 minutes (plus approval time)
Risk level: Recommend first, Act only with approval. Never grant *ALLOBJ lightly.

Beginner Context

User profiles are the root of access control. Over-privileged profiles are the #1 audit finding on IBM i. This playbook forces verification of existing authority before any change.

Steps

  1. Check existing authority (Observe)
    Ask: Check object authority on library QGPL for user DEMOUSR.
    This calls inspect_objects (view: "object_authority").
    Or: List user profiles with *ALLOBJ or *SECADM special authorities.
    This calls inspect_users (view: "special_authorities").

  2. Plan the new or changed profile (Recommend)
    Ask: Plan creation of user DEMOUSR with text "Demo operations user" and limited capabilities. Use *USER only, no special authorities.
    This calls manage_user_profile (operation: "create", mode: "preview"). The plan will validate the name, suggest a default library, and show the exact CRTUSRPRF command.

  3. Review the plan
    Confirm the user will not receive *ALLOBJ, that the initial library is appropriate, and that a group profile (if used) is the correct role.

  4. Execute with approval (Act)
    After a second pair of eyes (or your own security officer role) approves, call manage_user_profile (operation: "create", mode: "execute") with the approval ticket.

  5. Verify (Observe again)
    Ask: Check user authorities for DEMOUSR and confirm *USE on QGPL.
    This calls inspect_users (view: "authorities").

What Success Looks Like

  • New user has exactly the special authorities required for the role (usually none for *USER).
  • Group profile membership is documented.
  • An audit trail (approval request + command log) exists for compliance.

Teach Moments & Common Pitfalls

  • Pitfall: Granting *ALLOBJ “just for testing.” It is almost never needed and creates a permanent audit exposure.
  • Teach moment: Use group profiles for application roles. Changing one group updates access for dozens of users instantly and makes audits trivial.
  • Safety note: All user changes go through manage_user_profile in mode: "preview" first. The AI will never execute CRTUSRPRF or CHGUSRPRF without an explicit approval step.

Related tools: inspect_users, inspect_objects, manage_user_profile


4. PTF Management Cycle

Purpose: Keep the system current with security and functional fixes using the safe, grouped-PTF workflow.
Time: 15–30 minutes planning + apply window
Risk level: Observe + Recommend for ordering; Act only after cover-letter review and approval.

Beginner Context

PTFs fix defects and close vulnerabilities. Applying them individually is error-prone; groups ensure dependencies are met. The platform separates “order the group” from “apply the group.”

Steps

  1. Check current levels (Observe)
    Ask: Check OS release and installed PTF groups.
    This calls inspect_system (view: "ptf_groups").
    Also: List licensed programs and their release levels.
    This calls inspect_system (view: "licensed_programs").

  2. Research the target group (Recommend)
    Ask: Plan the CL command 'SNDPTFORD PTFGRPID(SF99730) ORDER(*PTFGROUP)' to order the PTF group.
    This calls plan_run_ibmi_command to validate the CL command syntax, check prerequisites, and generate the preview and approval ticket.

  3. Review the plan
    Read the plan output and any relevant cover letters for special instructions, dependencies, and post-apply actions.

  4. Order the group (Act)
    After approval, execute the approved command using run_ibmi_command with the approvalTicket. The system will download the PTFs into a save file.

  5. Apply and verify (subsequent shift or maintenance window)
    Use the inspect_system (view: "ptf_groups") tool again to confirm the new levels after the apply.

What Success Looks Like

  • Current PTF group level matches the latest cumulative or security group published by IBM.
  • No prerequisite PTFs are missing.
  • A documented approval and cover-letter review exists for the change.

Teach Moments & Common Pitfalls

  • Pitfall: Ordering a group without checking the cover letter. Some groups require a specific order or temporary changes to system values.
  • Teach moment: Cumulative PTF packages (SF99730 etc.) are the safest way to stay current. Individual PTFs are for emergency fixes only.
  • Safety note: Direct CL commands are executed through plan_run_ibmi_command and run_ibmi_command which ensures full visibility and human authorization gates.

Related tools: inspect_system, plan_run_ibmi_command, run_ibmi_command, get_ibmi_procedure_help


How to Use These Playbooks

  • Start every shift with the Daily Health Check.
  • When something feels off, run the Triage an Issue playbook before you touch anything.
  • User or security changes? Follow the User Provisioning sequence.
  • Maintenance window approaching? Use the PTF Management Cycle.

Each playbook is deliberately short so you can follow it while the AI is running the tools in the background. After you complete a playbook, open the Glossary for any term you want to understand more deeply, or jump to the Self-Study Runbook for guided practice exercises.

Remember: the AI proposes, you decide, the platform records. This is how IBM i administration stays both fast and safe while you learn the system.

Last updated: 2026-05-21