Full control with rate limits for our Managed AI models

15 April, 2026

Marc Zimmermann

Manager SaaS

Marc ist bei NETWAYS 2021 vorbeigekommen und wurde eingezogen. Sein Einstieg in die Welt der IT begann schon in seiner Jugend. Anfangs noch mehr mit Windows und DOS bis er von einen Freund von diesen "Linux" hörte. Wie sollen wir sagen, er ist darauf hängengeblieben. Bis heute 🙂

by Marc Zimmermann | Apr 15, 2026

AI Blog

With our managed AI models, we at NETWAYS Web Services offer you various AI models for privacy-compliant use. Depending on your use case, you can use general models, embedding or reranking models in your applications via standardized APIs.

The use of AI models via the API entails variable resource consumption – depending on how and for what the models are used. To give you more planning and financial security and control, we have introduced new options for controlling token throughput in MyNWS.

Rate limits: Global or per API key

API rate limits can now be configured flexibly – either globally for your entire project or granularly at the level of individual API keys. This makes it possible to control different use cases or teams in a targeted manner and limit unexpected load peaks.

The limits take effect in a rolling 60-second interval and can be set independently of each other for input tokens and output tokens. This separation allows for much more precise control, particularly in applications with a very different ratio of request to response length – such as summaries or structured data extraction.

Consumption at a glance: Model usage in MyNWS

You can view the actual token consumption per billing month at any time in MyNWS under the Usage tab in the overview of your project.

Managed AI Models usage overview — Overview of token usage per model and costs incurred in MyNWS

The usage and costs incurred are displayed here for each available model, broken down into input and output tokens. In addition, the token usage per model is visualized over time in a diagram.

This allows you to track consumption in the current month and compare it with the set limits.

Capacity planning and empirical values

The actual token consumption varies depending on the intended use. As a rough estimate, we have observed a distribution of around 80 % input tokens to 20 % output tokens.
You can use this guideline as a starting point for the initial configuration of your rate limits. After some time, however, you should validate the set limits based on your actual usage profile.

If you have any questions about the configuration or dimensioning of your limits, our MyEngineer® is of course also at your disposal.

Our portfolio