Mastering Amazon Bedrock Inference Profiles

· 11 min read
aws bedrock finops
Amazon Bedrock inference profiles overview diagram

A Practical Guide to Project-Based Cost Tracking and Management on Amazon Bedrock

Amazon Bedrock has changed the way teams deploy and operate generative AI workloads on AWS. You can experiment faster, integrate powerful foundation models easily, and scale without worrying about the underlying infrastructure.

But once AI usage spreads across multiple projects, teams, and environments, a familiar problem appears:

“Who is actually spending how much on Bedrock?”

This is exactly where Amazon Bedrock Inference Profiles become useful. They provide a way to separate usage, track costs per project, and apply operational controls.

In this article, we’ll walk through what inference profiles are, why they matter, and how to use them effectively in real-world AWS environments.

What Are Amazon Bedrock Inference Profiles?

Mastering Amazon Bedrock Inference Profiles — figure
Source

Amazon Bedrock Inference Profiles are custom, logical endpoints that sit in front of foundation models. Instead of calling a model directly, your application calls an inference profile.

You can think of them as dedicated access channels to Bedrock models, each one representing a project, customer, environment, or use case.

With inference profiles, you gain:

Types of Inference Profiles

There are two main types:

  1. System-defined inference profiles
    Pre-created by AWS, mainly used for cross-region routing and internal load balancing.
  2. Application inference profiles
    Created and managed by you. These are the profiles you’ll use for:

This guide focuses primarily on application inference profiles.

Why Use Application Inference Profiles?

1. Granular Cost Tracking

By default, all Bedrock usage is grouped under a single service line item. That makes it difficult to answer simple questions like:

Inference profiles solve this by allowing:

2. Better Operational Control

Each inference profile behaves like an independent endpoint. This means you can:

This is especially useful when multiple teams share the same AWS account.

3. Scalable Multi-Project Management

For organizations running multiple AI initiatives, inference profiles provide:

Setting Up Application Inference Profiles

Prerequisites

Before getting started, make sure you have:

Creating Your First Inference Profile

The exact setup differs slightly by region because model availability and routing rules vary.

Example: US East (us-east-1)

In us-east-1, you can create a profile directly from a foundation model ARN.

import boto3
bedrock = boto3.client("bedrock", region_name="us-east-1")
response = bedrock.create_inference_profile(
inferenceProfileName="ProjectA-USEast",
description="ProjectA cost tracking profile (US East)",
modelSource={
"copyFrom": "arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-sonnet-20240229-v1:0"
},
tags=[
{"key": "Project", "value": "ProjectA"},
{"key": "Environment", "value": "production"},
{"key": "CostCenter", "value": "AI-Team"},
{"key": "Department", "value": "Engineering"}
]
)
profile_arn = response["inferenceProfileArn"]
print(f"Profile ARN: {profile_arn}")

Example: EU Central (eu-central-1)

In eu-central-1, profiles must be created from system-defined inference profiles, not directly from foundation models.

import boto3
bedrock = boto3.client("bedrock", region_name="eu-central-1")
response = bedrock.create_inference_profile(
inferenceProfileName="ProjectA-EUCentral",
description="ProjectA cost tracking profile (EU Central)",
modelSource={
"copyFrom": "arn:aws:bedrock:eu-central-1:<...>:inference-profile/eu.anthropic.claude-sonnet-4-20250514-v1:0"
},
tags=[
{"key": "Project", "value": "ProjectA"},
{"key": "Environment", "value": "production"},
{"key": "Region", "value": "eu-central-1"},
{"key": "Model", "value": "claude-sonnet-4"}
]
)

Regional Differences (Important)

This is expected behavior and depends on how AWS deploys Bedrock models region by region.

Implementing Cost Tracking

Tagging Strategy (This Really Matters)

Your cost visibility is only as good as your tagging discipline.

A recommended baseline:

recommended_tags = [
{"key": "Project", "value": "ProjectA"},
{"key": "Environment", "value": "production"},
{"key": "CostCenter", "value": "AI-Team"},
{"key": "Department", "value": "Engineering"},
{"key": "Owner", "value": "TeamAlpha"},
{"key": "Region", "value": "eu-central-1"},
{"key": "Model", "value": "claude-sonnet-4"}
]

If you do only one thing after reading this article: standardize your tags early.

Using an Inference Profile in Your Application

Once created, inference profiles can be used exactly like a model ID.

import boto3
import json
def invoke_with_profile(profile_arn, message, region="eu-central-1"):
bedrock_runtime = boto3.client("bedrock-runtime", region_name=region)
body = {
"messages": [{"role": "user", "content": message}],
"max_tokens": 100,
"anthropic_version": "bedrock-2023-05-31"
}
response = bedrock_runtime.invoke_model(
modelId=profile_arn,
body=json.dumps(body)
)
result = json.loads(response["body"].read())
return result

# Example usage
profile_arn = "arn:aws:bedrock:eu-central-1:<...>:application-inference-profile/abc123"
result = invoke_with_profile(profile_arn, "Hello from ProjectA!")

No application refactor required. Just swap the model ID.

Monitoring Costs with AWS Cost Explorer

After usage starts, costs will appear in Cost Explorer (usually within 24–48 hours).

import boto3
from datetime import datetime, timedelta
def get_project_costs(projects, days=30):
ce = boto3.client("ce", region_name="us-east-1")
end_date = datetime.now().strftime("%Y-%m-%d")
start_date = (datetime.now() - timedelta(days=days)).strftime("%Y-%m-%d")
response = ce.get_cost_and_usage(
TimePeriod={"Start": start_date, "End": end_date},
Granularity="MONTHLY",
Metrics=["BlendedCost"],
GroupBy=[{"Type": "TAG", "Key": "Project"}],
Filter={
"And": [
{"Dimensions": {"Key": "SERVICE", "Values": ["Amazon Bedrock"]}},
{"Tags": {"Key": "Project", "Values": projects}}
]
}
)
project_costs = {}
for result in response["ResultsByTime"]:
for group in result["Groups"]:
project = group["Keys"][0] if group["Keys"] else "Untagged"
cost = float(group["Metrics"]["BlendedCost"]["Amount"])
project_costs[project] = cost
return project_costs

Real-World Use Cases

  1. Multi-Tenant SaaS

Create one inference profile per customer and track usage cleanly.

2. Department-Based Cost Allocation

Marketing, Engineering, Support, and Research can all share Bedrock while keeping costs separate.

3. Environment Separation

Different profiles for:

Advanced Cost Management

Budget Alerts

import boto3
def setup_project_budget(project_name, monthly_limit=100):
budgets = boto3.client("budgets", region_name="us-east-1")
account_id = boto3.client("sts").get_caller_identity()["Account"]
budgets.create_budget(
AccountId=account_id,
Budget={
"BudgetName": f"{project_name}-Bedrock-Monthly-Budget",
"BudgetLimit": {"Amount": str(monthly_limit), "Unit": "USD"},
"TimeUnit": "MONTHLY",
"BudgetType": "COST",
"CostFilters": {
"Service": ["Amazon Bedrock"],
"TagKey": ["Project"],
"TagValue": [project_name]
}
}
)

CloudWatch Metrics

Inference profiles automatically emit metrics such as:

These are perfect for dashboards and alarms.

Best Practices

Conclusion

AWS Bedrock Inference Profiles are one of those features that seem optional at first, until your AI usage grows. Then they become essential.

With proper use, they allow you to:

Start with a single project, get the tagging right, and build from there. Your future self (and finance team) will thank you.

For more information please check: https://aws.amazon.com/tr/blogs/machine-learning/manage-multi-tenant-amazon-bedrock-costs-using-application-inference-profiles/