Tutorial: How to Monitor a ZIO Application Using ZIO's Built-in Metric System?
Introduction
ZIO has a built-in metric system that allows us to monitor the performance of our application. This is very useful for debugging and tuning our application. In this tutorial, we are going to learn how to add metrics to our application and then how to connect our application to one of the metric backends, e.g. Prometheus.
Running the Example
To access the code examples, you can clone the ZIO Quickstarts project:
$ git clone https://github.com/zio/zio-quickstarts.git
$ cd zio-quickstarts/zio-quickstart-restful-webservice-metrics
And finally, run the application using sbt:
$ sbt run
Alternatively, to enable hot-reloading and prevent port binding issues, you can use:
sbt reStart
If you encounter a "port already in use" error, you can use sbt-revolver
to manage server restarts more effectively. The reStart
command will start your server and reStop
will properly stop it, releasing the port.
To enable this feature, we have included sbt-revolver
in the project. For more details on this, refer to the ZIO HTTP documentation on hot-reloading.
Trying a Simple Example
Before going to apply the metrics in our application, let's try a simple example:
import zio._
import zio.metrics.Metric
object MainApp extends ZIOAppDefault {
private val count = Metric.counterInt("fib_call_total").fromConst(1)
def fib(n: Int): ZIO[Any, Nothing, Int] =
if (n <= 1) ZIO.succeed(1)
else for {
a <- fib(n - 1) @@ count
b <- fib(n - 2) @@ count
} yield a + b
def run =
for {
i <- Console.readLine("Please enter a number to calculate fibonacci: ").mapAttempt(_.toInt)
n <- fib(i) @@ count
_ <- Console.printLine(s"fib($i) = $n")
c <- count.value
_ <- ZIO.debug(s"number of fib calls to calculate fib($i): ${c.count}")
} yield ()
}
In this example, we are going to calculate the Fibonacci number for a given number. We also count the number of times we call the fib
function using the count
metric. Finally, we will print the value of the metric as a debug message.
This is a pedagogical example of how to use metrics. In real life, we will probably want to poll the metrics using a web API and feed them to a monitoring system, e.g. Prometheus. In the following sections, we will learn how to do that by applying the metrics to our RESTful web service.
Built-in ZIO HTTP Metrics
ZIO HTTP has built-in support for metrics. We can attach metrics middleware to our HTTP application using the @@
syntax:
import zio._
import zio.http._
import zio.schema.codec.JsonCodec.schemaBasedBinaryCodec
object UserRoutes {
def apply(): Routes[UserRepo, Response] =
Routes(
Method.GET / "users" -> handler {
UserRepo.users.foldZIO(
e =>
ZIO
.logError(s"Failed to retrieve users. $e") *>
ZIO.fail(Response.internalServerError("Cannot retrieve users!")),
users =>
ZIO
.log(
s"Retrieved users successfully: response length=${users.length}"
)
.as(Response(body = Body.from(users)))
)
}
) @@ Middleware.metrics()
}
The metrics
middleware is attached to all the routes in the UserRoutes
. Currently, it only counts the number of requests to the /users
endpoint. We can add more routes to the UserRoutes
and all of them will be counted by the metrics
middleware.
After adding the metrics to routes, it is time to serve the metrics as a RESTful API. Before that, let's add the required dependencies to our project.
Dependencies
In the following sections, we are going to utilize the zio-metrics-connector
module from the ZIO ZMX project and also provide metrics as a REST API. So let's add the following dependency to our project:
libraryDependencies += "dev.zio" %% "zio-metrics-connectors" % "2.3.1"
libraryDependencies += "dev.zio" %% "zio-metrics-connectors-prometheus" % "2.3.1"
This module provides various connectors for metrics backend, e.g. Prometheus.
Serving Prometheus Metrics
The following snippet shows how to provide an HTTP endpoint that exposes the metrics as a REST API for Prometheus:
import zio.http._
import zio._
import zio.metrics.connectors.prometheus.PrometheusPublisher
object PrometheusPublisherRoutes {
def apply(): Routes[PrometheusPublisher, Nothing] = {
Routes(
Method.GET / "metrics" ->
handler(
ZIO.serviceWithZIO[PrometheusPublisher](_.get.map(Response.text))
)
)
}
}
Next, we need to add the PrometheusPublisherRoutes
HTTP App to our application:
import zio._
import zio.http._
import zio.metrics.connectors.{MetricsConfig, prometheus}
object MainApp extends ZIOAppDefault {
private val metricsConfig = ZLayer.succeed(MetricsConfig(1.seconds))
def run =
Server.serve(
GreetingRoutes() ++ DownloadRoutes() ++ CounterRoutes() ++ UserRoutes() ++ PrometheusPublisherRoutes()
).provide(
Server.default,
// An layer responsible for storing the state of the `counterApp`
ZLayer.fromZIO(Ref.make(0)),
// To use the persistence layer, provide the `PersistentUserRepo.layer` layer instead
InmemoryUserRepo.layer,
// configs for metric backends
metricsConfig,
// The prometheus reporting layer
prometheus.publisherLayer,
prometheus.prometheusLayer,
)
}
Testing the Metrics
Now that we have the metrics as a REST API, we can test them. Let's run the application and then send some requests to the application as below:
$ curl -i http://localhost:8080/users -d '{"name": "John", "age": 42}'
$ curl -i http://localhost:8080/users -d '{"name": "Jane", "age": 43}'
$ curl -i http://localhost:8080/users
If we fetch the metrics from the "/metrics" endpoint, we will see the metrics in the Prometheus format, like below:
$ curl -i http://localhost:8080/metrics
HTTP/1.1 200 OK
content-type: text/plain
date: Tue, 30 Apr 2024 18:58:26 GMT
content-length: 4801
# TYPE http_concurrent_requests_total gauge
# HELP http_concurrent_requests_total
http_concurrent_requests_total{method="GET",path="/users",} 0.0 1714503503829
# TYPE http_concurrent_requests_total gauge
# HELP http_concurrent_requests_total
http_concurrent_requests_total{method="POST",path="/users",} 0.0 1714503503829
# TYPE http_request_duration_seconds histogram
# HELP http_request_duration_seconds
http_request_duration_seconds_bucket{method="POST",path="/users",status="200",le="0.005",} 1.0 1714503503829
http_request_duration_seconds_bucket{method="POST",path="/users",status="200",le="0.01",} 1.0 1714503503829
http_request_duration_seconds_bucket{method="POST",path="/users",status="200",le="0.025",} 1.0 1714503503829
http_request_duration_seconds_bucket{method="POST",path="/users",status="200",le="0.05",} 1.0 1714503503829
http_request_duration_seconds_bucket{method="POST",path="/users",status="200",le="0.075",} 1.0 1714503503829
http_request_duration_seconds_bucket{method="POST",path="/users",status="200",le="0.1",} 2.0 1714503503829
http_request_duration_seconds_bucket{method="POST",path="/users",status="200",le="0.25",} 2.0 1714503503829
http_request_duration_seconds_bucket{method="POST",path="/users",status="200",le="0.5",} 2.0 1714503503829
http_request_duration_seconds_bucket{method="POST",path="/users",status="200",le="0.75",} 2.0 1714503503829
http_request_duration_seconds_bucket{method="POST",path="/users",status="200",le="1.0",} 2.0 1714503503829
http_request_duration_seconds_bucket{method="POST",path="/users",status="200",le="2.5",} 2.0 1714503503829
http_request_duration_seconds_bucket{method="POST",path="/users",status="200",le="5.0",} 2.0 1714503503829
http_request_duration_seconds_bucket{method="POST",path="/users",status="200",le="7.5",} 2.0 1714503503829
http_request_duration_seconds_bucket{method="POST",path="/users",status="200",le="10.0",} 2.0 1714503503829
http_request_duration_seconds_bucket{method="POST",path="/users",status="200",le="+Inf",} 2.0 1714503503829
http_request_duration_seconds_sum{method="POST",path="/users",status="200",} 0.100570365 1714503503829
http_request_duration_seconds_count{method="POST",path="/users",status="200",} 2.0 1714503503829
http_request_duration_seconds_min{method="POST",path="/users",status="200",} 0.00120463 1714503503829
http_request_duration_seconds_max{method="POST",path="/users",status="200",} 0.099365735 1714503503829
# TYPE http_request_duration_seconds histogram
# HELP http_request_duration_seconds
http_request_duration_seconds_bucket{method="GET",path="/users",status="200",le="0.005",} 0.0 1714503503829
http_request_duration_seconds_bucket{method="GET",path="/users",status="200",le="0.01",} 0.0 1714503503829
http_request_duration_seconds_bucket{method="GET",path="/users",status="200",le="0.025",} 1.0 1714503503829
http_request_duration_seconds_bucket{method="GET",path="/users",status="200",le="0.05",} 1.0 1714503503829
http_request_duration_seconds_bucket{method="GET",path="/users",status="200",le="0.075",} 1.0 1714503503829
http_request_duration_seconds_bucket{method="GET",path="/users",status="200",le="0.1",} 1.0 1714503503829
http_request_duration_seconds_bucket{method="GET",path="/users",status="200",le="0.25",} 1.0 1714503503829
http_request_duration_seconds_bucket{method="GET",path="/users",status="200",le="0.5",} 1.0 1714503503829
http_request_duration_seconds_bucket{method="GET",path="/users",status="200",le="0.75",} 1.0 1714503503829
http_request_duration_seconds_bucket{method="GET",path="/users",status="200",le="1.0",} 1.0 1714503503829
http_request_duration_seconds_bucket{method="GET",path="/users",status="200",le="2.5",} 1.0 1714503503829
http_request_duration_seconds_bucket{method="GET",path="/users",status="200",le="5.0",} 1.0 1714503503829
http_request_duration_seconds_bucket{method="GET",path="/users",status="200",le="7.5",} 1.0 1714503503829
http_request_duration_seconds_bucket{method="GET",path="/users",status="200",le="10.0",} 1.0 1714503503829
http_request_duration_seconds_bucket{method="GET",path="/users",status="200",le="+Inf",} 1.0 1714503503829
http_request_duration_seconds_sum{method="GET",path="/users",status="200",} 0.017157212 1714503503829
http_request_duration_seconds_count{method="GET",path="/users",status="200",} 1.0 1714503503829
http_request_duration_seconds_min{method="GET",path="/users",status="200",} 0.017157212 1714503503829
http_request_duration_seconds_max{method="GET",path="/users",status="200",} 0.017157212 1714503503829
# TYPE http_requests_total counter
# HELP http_requests_total
http_requests_total{method="POST",path="/users",status="200",} 2.0 1714503503829
# TYPE http_requests_total counter
# HELP http_requests_total
http_requests_total{method="GET",path="/users",status="200",} 1.0 1714503503829⏎
Now that we have the metrics as a REST API, we can add this endpoint to our Prometheus server to fetch the metrics periodically.
Conclusion
In this tutorial, we have learned how to define metrics and apply them to our application. We have also learned how to provide the metrics as a REST API which then can be polled by a Prometheus server.
All the source code associated with this article is available on the ZIO Quickstart on GitHub.