The GPT Portfolio Maturity Model: From Experimental to Production
Your organization has 500 Custom GPTs and Projects. How many are actually delivering business value?
If you’re like most enterprises with ChatGPT Enterprise, the honest answer is: you don’t know. Some were built in five minutes during a team meeting. Others represent weeks of careful prompt engineering, tool integration, and user testing. From the admin console, they all look the same.
The GPT Portfolio Maturity Model is a framework for classifying every Custom GPT and Project in your workspace by the depth and quality of its configuration: turning an opaque list into an actionable intelligence map.
The Three Tiers
The model classifies GPTs using a Sophistication Score (1-5), derived from semantic analysis of each GPT’s system prompt, tool configuration, and integration depth.
Tier 1: Experimental (Score 1-2)
What it looks like:
- System prompt is 1-3 sentences (“You are a helpful assistant for marketing”)
- No custom tools or actions configured
- No external integrations
- No structured output format
- Often a test or experiment that was never revisited
What it means: These GPTs represent the natural “try it and see” phase of AI adoption. Someone heard about Custom GPTs, built one in 5 minutes, and either forgot about it or found it wasn’t useful enough to invest more time.
Typical characteristics:
- Created by someone exploring ChatGPT for the first time
- No clear audience beyond the creator
- No measurable business impact
- Often duplicates a GPT that already exists
Tier 2: Functional (Score 3)
What it looks like:
- Structured system prompt with clear role definition
- Some use of custom instructions (tone, format, constraints)
- May have basic tool usage (code interpreter, file upload)
- Serves a specific use case with a defined audience
- Used regularly by the creator, sometimes shared with team
What it means: These GPTs are delivering value but haven’t been optimized. They solve a real problem, but the implementation has room to grow: better prompt engineering, tool integration, quality safeguards, or broader distribution.
Typical characteristics:
- Created by someone with moderate ChatGPT experience
- Clear business process alignment
- Active usage within a team
- Could benefit from prompt engineering training
Tier 3: Production (Score 4-5)
What it looks like:
- Comprehensive system prompt (500+ words) with detailed instructions
- Structured output formats (JSON, markdown templates, etc.)
- External integrations via Actions (APIs, databases, third-party services)
- Error handling and edge case coverage in the prompt
- Clear documentation or onboarding for users
- Active usage across teams or departments
What it means: These are your organization’s AI assets. They automate real business processes, handle complex workflows, and are relied upon by multiple people. The builders behind them are your AI power users.
Typical characteristics:
- Created by someone with strong prompt engineering skills
- Often integrated into existing workflows
- Measurable time savings or quality improvement
- Others should learn from these implementations
The Typical Enterprise Distribution
After analyzing GPT portfolios across organizations, a consistent pattern emerges:
| Tier | Percentage | Implication |
|---|---|---|
| Experimental | ~60% | Natural exploration: but creates noise, duplicates, and governance overhead |
| Functional | ~25% | The growth opportunity: targeted upskilling moves these toward Production |
| Production | ~15% | The proof of value: these justify the ChatGPT Enterprise investment |
This 60/25/15 distribution isn’t inherently bad. It reflects healthy organic adoption: people experimenting, finding what works, and investing in what delivers value. The problem is when you can’t see this distribution. Without measurement, you can’t:
- Retire the 60% that’s creating noise
- Upskill the 25% that’s close to Production
- Recognize the 15% that’s proving the value of your AI investment
- Learn from the patterns that distinguish Production from Experimental
Measuring Sophistication
The Sophistication Score isn’t a simple heuristic. It’s derived from semantic analysis of each GPT’s full configuration using LLM-based enrichment. The analysis considers:
Prompt Depth (40% weight)
- Length and structure of system instructions
- Presence of role definition, constraints, and examples
- Edge case handling and error instructions
- Output format specifications
Tool Usage (25% weight)
- Number and type of configured tools/actions
- External API integrations
- File handling capabilities
- Code execution configuration
Integration Depth (20% weight)
- Connection to external data sources
- Webhook or API action configurations
- Authentication and security setup
- Multi-step workflow orchestration
User Experience (15% weight)
- Conversation starters / onboarding
- Clear audience definition
- Documentation or usage instructions
- Feedback mechanisms
Using the Model
For AI Leaders
The maturity distribution is your first dashboard metric. Track it over time:
- Improving: Experimental percentage shrinking, Functional and Production growing
- Stagnant: Distribution unchanged quarter over quarter: training isn’t working or isn’t reaching the right people
- Growing but immature: Total GPT count rising but Experimental percentage also rising: creation is outpacing capability building
For L&D Teams
Map each tier to specific learning interventions:
| Tier | Learning Intervention |
|---|---|
| Experimental → Functional | Prompt engineering fundamentals, ChatGPT best practices |
| Functional → Production | Advanced prompt patterns, tool/action integration, testing methodologies |
| Production (maintain) | Peer sharing, architecture reviews, advanced API courses |
The key insight: don’t give everyone the same training. An Experimental builder needs different skills than a Functional builder who’s ready for Production.
For Governance Teams
The maturity model informs risk prioritization:
- Experimental GPTs are low risk individually (simple prompts, no integrations) but high risk collectively (noise, duplicates, ungoverned data handling)
- Functional GPTs need review: they’re handling real data but may lack proper safeguards
- Production GPTs warrant full governance review: they’re critical enough that failures have real business impact
For Finance / ROI
The model directly answers the ROI question:
- Count Production GPTs and estimate time savings per GPT
- Identify Functional GPTs close to Production (score 3 with high usage): these are your lowest-hanging ROI fruit
- Quantify Experimental waste: how many duplicates? How many abandoned?
Beyond Static Classification
The maturity model becomes powerful when tracked over time:
Builder Trajectories: Which builders are consistently creating Production-grade GPTs? These are your AI champions. Which are stuck at Experimental? They need targeted support.
Department Patterns: Is Engineering at 30% Production while Marketing is at 5%? That’s not a failure: it’s a data point for where to invest L&D resources.
Training Impact: Did the prompt engineering workshop actually move the needle? Compare maturity distributions before and after for workshop participants.
GPT Lifecycle: How long does it take for a GPT to move from Experimental to Functional? From Functional to Production? This is your AI capability velocity.
Implementation with AgentsOrg
AgentsOrg implements the GPT Portfolio Maturity Model automatically:
- Discovery: Connects to OpenAI’s Compliance API to fetch your full GPT and Project inventory
- Enrichment: Runs 9 semantic KPI analyses per GPT, including Sophistication Score
- Classification: Maps each GPT to Experimental, Functional, or Production
- Visualization: The Leader Dashboard shows maturity distribution with drill-down to individual GPTs
- Action: Builder Recognition identifies your champions; L&D recommends specific courses for each builder based on actual gaps
Deploy in minutes. Demo mode included for evaluation:
git clone https://github.com/ori129/agentsorg.git
cd agentsorg && make up
The GPT Portfolio Maturity Model is part of AI Transformation Intelligence: a discipline we’re defining in the open. Contribute to the framework or discuss your organization’s experience.
Try AgentsOrg Today
Deploy the open-source AI Transformation Intelligence platform in under 5 minutes.
View on GitHub