Hey all!
I’m trying to use puppeteer-core
and chrome-aws-lambda
to crawl public sites and collect relevant data for my users.
But it seems that, since chrome-aws-lambda
includes an entire browser, adding these packages to the api workspace brings my graphql
function to a whopping 85 mb compressed / 268 mb uncompressed, which is greater than the 66 mb compressed / 250 mb uncompressed limit imposed by Netlify / AWS.
So unless I’m overlooking something, it seems that I can’t use these packages on the api side. I’ve considered the following work-arounds and would greatly appreciate any thoughts you might have:
-
Move off of Netlify’s serverless hosting, and onto a “server-full” hosting option like Heroku (I’m pretty sure I’ve seen mentioning of this as an option in the forums – haven’t researched it closely though)
- Pros: this should remove the limit and allow for me to use puppeteer
- Cons: likely complicates the deployment and development process, moves further away from the vision of redwood
-
Continue with the serverless hosting, but move all puppeteer code off onto its own server, sharing the same heroku database, running puppeteer code on a cron job.
- Pros: development process doesn’t get complicated, puppeteer can run
- Cons: without building its own API the puppeteer code couldn’t get triggered by user actions, which isn’t ideal
Right now I’m leaning toward the second option, but i’m curious if anyone here (1) is aware of some way I can do this entirely within RW somehow or (2) can think of a more elegant / less time-consuming workaround than one of the above.
Thank you in advance! And, yes, trying to run a browser within a lambda function is kind of ridiculous