PM2 API Start Issues - Baremetal Deploy

I am trying to work through the baremetal deploy for the API only. I have all the stages of the deploy running from the development server. However, pm2 is failing to start the server with the following error.

Error: Command failed with exit code 1: yarn node server.js --apiRootPath /
    at makeError (/var/www/SpicyFriends/20240812210807/node_modules/execa/lib/error.js:60:11)
    at handlePromise (/var/www/SpicyFriends/20240812210807/node_modules/execa/index.js:118:26)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async apiServerFileHandler (/var/www/SpicyFriends/20240812210807/node_modules/@redwoodjs/cli/dist/commands/serveHandler.js:92:3)
    at async Object.handler (/var/www/SpicyFriends/20240812210807/node_modules/@redwoodjs/cli/dist/commands/serve.js:81:9)
    at async runYargs (/var/www/SpicyFriends/20240812210807/node_modules/@redwoodjs/cli/dist/index.js:151:3)
    at async /var/www/SpicyFriends/20240812210807/node_modules/@redwoodjs/cli/dist/index.js:105:7
    at async main (/var/www/SpicyFriends/20240812210807/node_modules/@redwoodjs/cli/dist/index.js:94:3)

If I run yarn node server.js --apiRootPath / manually in the API dist folder, it works. And if I run yarn rw api serve it works. I suspect there is some possible environment issue. However, I am struggling to figure out where to look next, appreciate any help anyone can provide.

Thanks
Mike

I just read the 8.0 upgrade notes and saw this.

In that case there are additional pm2 specific actions you must take when upgrading your system from v18 to v20.

I am currently running Node v20.16. I wonder if these needed extra actions are the root cause of my issue.

Hey @mloder :wave:

@rob Is our resident baremetal pm2 guy so he will likely be the best person to help out here.

I think this note about startup and unstartup was what helped Rob and I move forward on a similar sort of issue. Perhaps that could help?

Hi Josh :wave:

Thanks for the suggestion on this. I tried the steps you pointed out and it didn’t make a change.

I am running 7.7.2 of redwood. I switched the ecosystem.config.js file to use “fork” mode and a more detailed exception showed up indicating the port is already in use. Which makes sense to me if multiple instances are forked. I also noticed in “cluster” mode the first instance works, the second keeps failing per the issue above. The exception is not as detailed in cluster mode but I suspect it’s the same issue.

I have very little experience with PM2 but its my understanding its designed for scaling instances of the API. Not sure how that works with ports being opened and conflict. Still digging, but appreciate any more help if you have it.

1|api    | node:net:1904
1|api    |     const ex = new UVExceptionWithHostPort(err, 'listen', address, port);
1|api    |                ^
1|api    | Error: listen EADDRINUSE: address already in use 0.0.0.0:8911
1|api    |     at Server.setupListenHandle [as _listen2] (node:net:1904:16)
1|api    |     at listenInCluster (node:net:1961:12)
1|api    |     at doListen (node:net:2135:7)
1|api    |     at process.processTicksAndRejections (node:internal/process/task_queues:83:21) {
1|api    |   code: 'EADDRINUSE',
1|api    |   errno: -98,
1|api    |   syscall: 'listen',
1|api    |   address: '0.0.0.0',
1|api    |   port: 8911
1|api    | }
1|api    | Node.js v20.16.0
1|api    | Error: Command failed with exit code 1: yarn node server.js --apiRootPath /
1|api    |     at makeError (/var/www/SpicyFriends/20240813125422/node_modules/execa/lib/error.js:60:11)
1|api    |     at handlePromise (/var/www/SpicyFriends/20240813125422/node_modules/execa/index.js:118:26)
1|api    |     at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
1|api    |     at async apiServerFileHandler (/var/www/SpicyFriends/20240813125422/node_modules/@redwoodjs/cli/dist/commands/serveHandler.js:92:3)
1|api    |     at async Object.handler (/var/www/SpicyFriends/20240813125422/node_modules/@redwoodjs/cli/dist/commands/serve.js:81:9)
1|api    |     at async runYargs (/var/www/SpicyFriends/20240813125422/node_modules/@redwoodjs/cli/dist/index.js:151:3)
1|api    |     at async /var/www/SpicyFriends/20240813125422/node_modules/@redwoodjs/cli/dist/index.js:105:7
1|api    |     at async main (/var/www/SpicyFriends/20240813125422/node_modules/@redwoodjs/cli/dist/index.js:94:3)
1|api    | Need help?
1|api    |  - Not sure about something or need advice? Reach out on our Forum (​https://community.redwoodjs.com/​)
1|api    |  - Think you've found a bug? Open an issue on our GitHub (​https://github.com/redwoodjs/redwood​)
1|api    |  - Here's your unique error reference to quote: '873569ad-12aa-42d3-8b9a-3a6637224ff7'

Mike

Hello! Sorry about that…I have to admit that I was brand new to pm2 when I started working on the baremetal deploy option, I just needed something to keep the services running and that seemed to be the most common tool in use for doing it!

In the ecosystem.config.js file there’s a line for instances: 'max' that tells pm2 to start as many instances as there are CPUs on your system (in theory maximizing performance of the cluster). If you change that to just 1 (not sure if you can use a number or is has to be a string?) does the service start reliably with pm2 start api?

Josh and I thought it was weird that pm2 could somehow start multiple instances of the app running on the same port, but we found this in a Stack Overflow answer:

Node in cluster mode creates one master and then spawns worker process which share TCP connection so basically load is distributed among workers.

I’m running an app in production right now that’s in cluster mode and has 2 instances of the api running (both on what appears to be port 8911) and it’s working great. So I’m not sure what would be preventing you from doing the same…

Here’s my ecosystem.config.js:

module.exports = {
  apps: [
    {
      name: 'api',
      cwd: 'current',
      script: 'node_modules/.bin/rw',
      args: 'serve api',
      instances: 'max',
      exec_mode: 'cluster',
      wait_ready: true,
      listen_timeout: 10000,
    },
  ],
}

Hi Rob,

Thanks for the response. My ecosystem.config.js is identical to yours which produces the issue. When I change instances: '1' the API starts up properly and works in fork and cluster mode. When it’s set to max it starts 2, but the second instance crashes. I assume its the open port in use issue as that’s what the fork option produces.

Mike

Yeah I can see that fork mode would definitely cause port collisions as it’s just starting multiple stand-alone copies of the app, completely independent of each other. There’s supposed to be a way to tell pm2 to send different config options to each, like -p 8911 to one instance and -p 8912 to the second. And you could do this, just update nginx or whatever is sitting out front to sent traffic to both ports.

But, cluster mode is supposed to avoid all that and just magically work! I found another doc that states:

  • Automatic port sharing: You can run multiple processes on the same port, by using the cluster module of Node.js to enable load balancing and port sharing.

Have you looked at the pm2 logs themselves to see if there’s any more info? Check in ~/.pm2/logs…

Here’s a thread of people complaining about this very thing happening (cluster mode, first instance works but no others). This solution says it’s an issue with npm run … are you using npm or yarn? I’m using yarn in my app…

Hi Rob,

Really appreciate you digging into helping me on this. I am using yarn on my development machine. I believe the yarn rw deploy baremetal production runs yarn on the remote side.

The ecosystem.config.js is setup to run node_modules/.bin/rw as the main entry point so I don’t think the NPM issue would apply but I am not 100% sure.

The pm2 logs actually have more details it is pointing to the address already in use error so it shows more than the pm2 monit output does.

Node.js v20.16.0
2024-08-14T14:48:15: PM2 log: App name:api id:1 disconnected
2024-08-14T14:48:15: PM2 log: App [api:1] exited with code [1] via signal [SIGINT]
2024-08-14T14:48:15: PM2 log: App [api:1] starting in -cluster mode-
Importing Server Functions... 
{"level":40,"time":1723646905019,"pid":49352,"hostname":"ubuntu","module":"mailer","msg":"Automatically loaded the '@redwoodjs/mailer-handler-in-memory' handler, this will be used to process mail in test mode"}
2024-08-14T14:48:25: PM2 log: App [api:1] online
/auth 3549 ms
/healthz 3492 ms
...Done importing in 3557 ms
{"level":40,"time":1723646905026,"pid":49352,"hostname":"ubuntu","module":"mailer","msg":"Automatically loaded the '@redwoodjs/mailer-handler-studio' handler, this will be used to process mail in development mode"}
GraphQL Yoga Server endpoint at graphql
GraphQL Yoga Server Health Check endpoint at graphql/health
GraphQL Yoga Server Readiness endpoint at graphql/readiness
node:net:1904
    const ex = new UVExceptionWithHostPort(err, 'listen', address, port);
               ^

Error: listen EADDRINUSE: address already in use 0.0.0.0:8911
    at Server.setupListenHandle [as _listen2] (node:net:1904:16)
    at listenInCluster (node:net:1961:12)
    at doListen (node:net:2135:7)
    at process.processTicksAndRejections (node:internal/process/task_queues:83:21) {
  code: 'EADDRINUSE',
  errno: -98,
  syscall: 'listen',
  address: '0.0.0.0',
  port: 8911
}

Node.js v20.16.0
2024-08-14T14:48:26: PM2 log: App name:api id:1 disconnected
2024-08-14T14:48:26: PM2 log: App [api:1] exited with code [1] via signal [SIGINT]
2024-08-14T14:48:26: PM2 log: App [api:1] starting in -cluster mode-

At least this shows for sure this is the issue i can focus on trying to figure this out.

Mike

So strange…cluster mode is specifically designed so that the processes can share ports!

At least we know you could run instances: 1 in the meantime so your app can be up and running. It might be performant enough with just the single instance that it handles your needs…not the most redundant solution if something goes wrong, though.

Thanks rob,

The single core should be more than fine for now, not sure about the future but that’s an issue for later I can work through.

Appreciate the help
Mike

1 Like