Admin Playbooks
These playbooks are practical, day-to-day operating procedures that apply the Observe → Recommend → Act model to common IBM i administration tasks. Each playbook is written for beginners and experienced admins alike. They emphasize safety, verification, and learning while doing.
Use them as checklists during your shift. Every step references the exact tools you can ask the AI to run. Cross-links to the Glossary and Tools Reference are included so you can deepen your understanding on the fly.
1. Daily Health Check
Purpose: Establish a baseline of system health in under 5 minutes.
Time: 3–5 minutes
Risk level: Observe only — zero risk.
Beginner Context
A daily health check catches storage pressure, runaway jobs, and unanswered operator messages before they become incidents. The AI can run the entire sweep with one prompt and explain the results in plain language.
Steps
-
Run the system health check
Ask:Run a system health check and summarize the biggest risks.
This callsinspect_system(view:"health") and returns storage, PTF, job, and security posture in one response. -
Review storage utilization
Ask:Check storage usage summary and tell me whether capacity looks healthy.
Focus on ASP utilization percentage and top consumers viainspect_storage(view:"summary") and (view:"consumers"). (See ASP and Library in the glossary). -
Inspect active jobs
Ask:Check active jobs and show the top CPU jobs with subsystem names.
Look for jobs in MSGW or LCKW status and note their subsystem viainspect_jobs(view:"active"). (See Job, Subsystem). -
Triage operator messages
Ask:Check QSYSOPR and tell me whether any messages still require attention.
Unanswered inquiry messages (type *INQ) are the highest priority. This callsinspect_messages(view:"qsysopr"). (See QSYSOPR). -
Check recent system history
Ask:Check recent system history log messages and call out high-severity items.
This callsinspect_messages(view:"history_log") and surfaces hardware or software faults that may not yet have raised an operator message.
What Success Looks Like
- All ASPs < 80% full.
- No critical jobs in MSGW for > 5 minutes.
- QSYSOPR has zero unanswered *INQ messages older than 15 minutes.
- No PTF or security exposure flagged in the health summary.
Teach Moments & Common Pitfalls
- Pitfall: Ignoring QSYSOPR because “nothing looks urgent.” A single unanswered inquiry can pause an entire night batch stream.
- Teach moment: High CPU in QBATCH often means a long-running report or a loop; use
inspect_messages(view:"job_log") on the specific job number to see the last 20 messages before deciding to end it. - Safety note: This playbook is 100% Observe. You can run it on production at any time.
Related tools: inspect_system, inspect_storage, inspect_jobs, inspect_messages
2. Triage an Issue
Purpose: Move from “something feels wrong” to a confirmed root cause and a safe remediation plan.
Time: 10–20 minutes
Risk level: Observe first, then Recommend, then Act only after approval.
Beginner Context
Most incidents start with a vague symptom: “the system is slow,” “reports aren’t printing,” or “users can’t sign on.” This playbook forces you to observe broadly, then narrow, then plan the fix.
Steps
-
Broad observation
Ask:Run a system health check and summarize the biggest risks.
Note any red flags in storage, jobs, or messages viainspect_system(view:"health"). -
Narrow to the symptom
- Slow interactive? →
inspect_jobs(view:"active") - Reports stuck? →
inspect_spool(view:"output_queue_entries") - Users locked out? →
inspect_security(view:"login_activity")
- Slow interactive? →
-
Get the job log (if a specific job is involved)
Ask:Get the job log for job 123456/QUSER/QPADEV0001 and summarize the last 20 messages.
This callsinspect_messages(view:"job_log"). Look for CPF or MCH errors, authority failures, or record-lock messages. -
Form a hypothesis and create a plan
Once you have evidence, ask the AI to plan the fix:
Plan ending job 123456/QUSER/QPADEV0001 with a controlled end and explain the risk.
This callsmanage_job(operation:"end", mode:"preview"). Review the generatedENDJOBcommand, the objects it will affect, and thepreviewId/previewTokenit creates. -
Approve and act
After human review in the Hub interface (generating anapprovalTicket), execute the approved change by invokingmanage_job(operation:"end", mode:"execute").
What Success Looks Like
- You can state the root cause in one sentence backed by live data.
- A Change Plan exists and has been reviewed before any mutation.
- The job log or message queue entry that triggered the issue is captured in your investigation notes.
Teach Moments & Common Pitfalls
- Pitfall: Ending a job without reading its log first. The job may be waiting on a lock or a missing file; killing it can make recovery harder.
- Teach moment: Many “slow system” tickets are actually one long-running batch job monopolizing CPU. The
inspect_jobs(view:"active") tool surfaces this in seconds. - Safety note: Never run a state change without a preceding
mode: "preview"step. The approval gate exists to protect you and the system.
Related tools: inspect_system, inspect_jobs, inspect_spool, inspect_security, inspect_messages, manage_job
3. User Provisioning
Purpose: Create or modify user profiles following least-privilege and audit-friendly patterns.
Time: 5–10 minutes (plus approval time)
Risk level: Recommend first, Act only with approval. Never grant *ALLOBJ lightly.
Beginner Context
User profiles are the root of access control. Over-privileged profiles are the #1 audit finding on IBM i. This playbook forces verification of existing authority before any change.
Steps
-
Check existing authority (Observe)
Ask:Check object authority on library QGPL for user DEMOUSR.
This callsinspect_objects(view:"object_authority").
Or:List user profiles with *ALLOBJ or *SECADM special authorities.
This callsinspect_users(view:"special_authorities"). -
Plan the new or changed profile (Recommend)
Ask:Plan creation of user DEMOUSR with text "Demo operations user" and limited capabilities. Use *USER only, no special authorities.
This callsmanage_user_profile(operation:"create", mode:"preview"). The plan will validate the name, suggest a default library, and show the exactCRTUSRPRFcommand. -
Review the plan
Confirm the user will not receive*ALLOBJ, that the initial library is appropriate, and that a group profile (if used) is the correct role. -
Execute with approval (Act)
After a second pair of eyes (or your own security officer role) approves, callmanage_user_profile(operation:"create", mode:"execute") with the approval ticket. -
Verify (Observe again)
Ask:Check user authorities for DEMOUSR and confirm *USE on QGPL.
This callsinspect_users(view:"authorities").
What Success Looks Like
- New user has exactly the special authorities required for the role (usually none for
*USER). - Group profile membership is documented.
- An audit trail (approval request + command log) exists for compliance.
Teach Moments & Common Pitfalls
- Pitfall: Granting
*ALLOBJ“just for testing.” It is almost never needed and creates a permanent audit exposure. - Teach moment: Use group profiles for application roles. Changing one group updates access for dozens of users instantly and makes audits trivial.
- Safety note: All user changes go through
manage_user_profileinmode: "preview"first. The AI will never executeCRTUSRPRForCHGUSRPRFwithout an explicit approval step.
Related tools: inspect_users, inspect_objects, manage_user_profile
4. PTF Management Cycle
Purpose: Keep the system current with security and functional fixes using the safe, grouped-PTF workflow.
Time: 15–30 minutes planning + apply window
Risk level: Observe + Recommend for ordering; Act only after cover-letter review and approval.
Beginner Context
PTFs fix defects and close vulnerabilities. Applying them individually is error-prone; groups ensure dependencies are met. The platform separates “order the group” from “apply the group.”
Steps
-
Check current levels (Observe)
Ask:Check OS release and installed PTF groups.
This callsinspect_system(view:"ptf_groups").
Also:List licensed programs and their release levels.
This callsinspect_system(view:"licensed_programs"). -
Research the target group (Recommend)
Ask:Plan the CL command 'SNDPTFORD PTFGRPID(SF99730) ORDER(*PTFGROUP)' to order the PTF group.
This callsplan_run_ibmi_commandto validate the CL command syntax, check prerequisites, and generate the preview and approval ticket. -
Review the plan
Read the plan output and any relevant cover letters for special instructions, dependencies, and post-apply actions. -
Order the group (Act)
After approval, execute the approved command usingrun_ibmi_commandwith theapprovalTicket. The system will download the PTFs into a save file. -
Apply and verify (subsequent shift or maintenance window)
Use theinspect_system(view:"ptf_groups") tool again to confirm the new levels after the apply.
What Success Looks Like
- Current PTF group level matches the latest cumulative or security group published by IBM.
- No prerequisite PTFs are missing.
- A documented approval and cover-letter review exists for the change.
Teach Moments & Common Pitfalls
- Pitfall: Ordering a group without checking the cover letter. Some groups require a specific order or temporary changes to system values.
- Teach moment: Cumulative PTF packages (SF99730 etc.) are the safest way to stay current. Individual PTFs are for emergency fixes only.
- Safety note: Direct CL commands are executed through
plan_run_ibmi_commandandrun_ibmi_commandwhich ensures full visibility and human authorization gates.
Related tools: inspect_system, plan_run_ibmi_command, run_ibmi_command, get_ibmi_procedure_help
How to Use These Playbooks
- Start every shift with the Daily Health Check.
- When something feels off, run the Triage an Issue playbook before you touch anything.
- User or security changes? Follow the User Provisioning sequence.
- Maintenance window approaching? Use the PTF Management Cycle.
Each playbook is deliberately short so you can follow it while the AI is running the tools in the background. After you complete a playbook, open the Glossary for any term you want to understand more deeply, or jump to the Self-Study Runbook for guided practice exercises.
Remember: the AI proposes, you decide, the platform records. This is how IBM i administration stays both fast and safe while you learn the system.
Last updated: 2026-05-21