OpenTelemetry Support [Experimental]

Experimental OpenTelemetry Support

OpenTelemetry https://opentelemetry.io/

OpenTelemetry is a collection of tools, APIs, and SDKs. Use it to instrument, generate, collect, and export telemetry data (metrics, logs, and traces) to help you analyze your software’s performance and behavior.

OpenTelemetry aims to become the open standard for instrumenting code with logs, metrics and tracing telemetry and they are maturing rapidly and have both stable and successful APIs to prove it. We think that’s a noble aim and aim to offer first class support for OpenTelemetry in the RedwoodJS framework.

RedwoodJS + OpenTelemetry = Easy

Demo

Setup

To setup OpenTelemetry you can run one simple command:

yarn rw experimental setup-opentelemetry

or use the exp abbreviation of the new experimental CLI section like so:

yarn rw exp setup-opentelemetry

After that you’ll need to update your graphql configuration to include the new openTelemetryOptions option:

export const handler = createGraphQLHandler({
  authDecoder,
  getCurrentUser,
  loggerConfig: { logger, options: {} },
  directives,
  sdls,
  services,
  onException: () => {
    // Disconnect from your database with an unhandled exception.
    db.$disconnect()
  },
  // This is new and you'll need to manually add it
  openTelemetryOptions: {
    resolvers: true,
    result: true,
    variables: true,
  }
})

as this will allow OpenTelemetry instrumentation inside you graphql server.

Changes

TOML
The setup command adds the following values to the redwood.toml file:

[experimental.opentelemetry]
  enabled = true
  apiSdk = "/home/linux-user/redwood-project/api/src/opentelemetry.js"

The enabled options simply turns on or off OpenTelemetry - beware this doesn’t turn off the graphql OpenTelemetry you’ll need to remove the plugin options for that to happen. The apiSdk option should point to the js file which is loaded before your application code to setup the OpenTelemetry SDK. You will likely want to leave this as the default value.

SDK File
The api/src/opentelemetry.js|ts file generated by the setup command is where the OpenTelemetry SDK is defined. For more information on the contents of this file - including what options you may wish to edit to suit your own needs - please see the documentation at Manual | OpenTelemetry.

Availability

The setup command is currently available from the canary version of Redwood. You can try this out in a new project by running yarn rw upgrade --tag canary and following any general upgrade steps recommend on the forums.

Limitations

Currently we are only supporting OpenTelemetry from the “api” side of Redwood but we aim to add the “web” side soon.

We also at the moment only support OpenTelemetry during development, that is yarn rw dev will automatically enable OpenTelemetry when you have it setup and enabled within the TOML. Other commands like yarn rw serve do not currently do this but we hope to add this in the future too.

Known Issues

There are a few know issues which we will be addressing shortly:

  1. Services which return a promise but that are not marked as async do not have the correct timings reported.

Feedback

Please leave feedback as comments to this forum post. We would love to hear what’s broken, what isn’t clear and what additions or changes you’d like to see!

We would also welcome any form of collaboration on this feature!

4 Likes

Hi,
I tried to use Redwood Studio and Open Telemetry. First of all it is super nice to get such thourough insights on queries!
When I am using it I sometimes get these logs on my api side:

{"stack":"Error: connect ECONNREFUSED 127.0.0.1:4318\n    at __node_internal_captureLargerStackTrace (node:internal/errors:484:5)\n    at __node_internal_exceptionWithHostPort (node:internal/errors:662:12)\n    at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1300:16)\n    at TCPConnectWrap.callbackTrampoline (node:internal/async_hooks:130:17)","message":"connect ECONNREFUSED 127.0.0.1:4318","errno":"-4078","code":"ECONNREFUSED","syscall":"connect","address":"127.0.0.1","port":"4318","name":"Error"}

And some of the queries are not named properly, but I guess this is a known issue.
Another thing I observed is the decreased performance. Queries that would normally take some 80ms, can suddenly take up to 4000ms (querying a collection of 1000 items) and if I have relational resolvers they seem to lead to N+1 issues creating for a query of 20000ms. In the opentelemetry.ts a SimpleSpanProcessor is mentioned and that it is preferable to use the BatchSpanProcessor in production. I tried it in dev, but I think it did not change much.
I am now just wondering, if the decreased performance also influences the analysis and if I can rely on the traces?

I hope we see more on this tooling, it would be really helpful for production, too :slight_smile:

1 Like

Thanks for the feedback @dennemark, it’s really appreciated!

I have to admit I did not test with collections that large. I’ll make a note to try this out and see if I can reproduce this poor performance and track down the cause. We certainly can’t have this take seconds!

The naming issues are surprising I’ve not came across that before. Is there any more information you can provide about that? Was it any particular type of query or mutation? Likewise I’ve never seen the issue with not being able to communicate with the local server at 4318 when studio was run from the CLI. I’ll make sure to keep an eye out for these too when I investigate this poor performance.

One thing I do have to get back to and fix/check is around services returning promises. Right now the timing for services doesn’t reflect resolving any promise it’s returning. That could also be an issue in other places where redwood lets you return a promise that we’ll resolve internally at some point.

Thanks again for the feedback! We need people trying it out, pointing out what needs to be better or fixed and if they find it useful. This lets us justify spending time on these experimental features.

Ah I thought the naming issue might be related to:

It seems Redwood Studio is collecting all queries in a trace and sometimes it takes a bit longer. But there are cases where it only shows a blank title as if it did not fully collect the trace.

1 Like