Loading data from big files

RogerKikkas · February 26, 2021, 2:36pm

Hi there!

I am very new to RedwoodJS and JavaScript in the back-end. I chose to try out RedwoodJS for a test assignment and a part of the assignment is to create a front end view where the user can upload files and it loads the data into the database.

But I’m having trouble reading in the bigger files (biggest files are around 600MB with 1M lines). I’m currently using FileReader on the front end to load in the file and then send it through mutations to the back end. My problems are however:

I can’t seem to get createMany working on the backend so currently it is inserting every row one by one.
The request that the mutation does has a limit on it and I haven’t found a way around it so currently it can only take an array with around 300 objects in it.

If anyone could point me in the right direction about how to tackle this issue I would greatly appreciate it!

dthyresson · February 26, 2021, 6:38pm

For files of that size, you may want to consider using PG Copy

Or even importing via TablePlus.

Depending on what you are loading, you really don’t want to load 600MB into memory and then send 600MB of data in a createMany().

Perhaps load the csv stream incrementally and send createMany every 10,000 rows. But that will only work locally if you have access to as stream.

That said:

part of the assignment is to create a front end view where the user can upload files and it loads the data into the database.

Allowing a user to upload data directly to your database via a file is not the best idea – and if the file is 600MB you’d want to just some service like Uppy

But - if you do want to do this, start small. Upload a csv with 10 records and test it out before going 600MB

Typically, these would be done via a background data loading pipeline:

Upload for to S3
On S3 event trigger a background job
Job fetches data from S3
Job does a bulk load using AWS S3 and some RDS loading

currently using FileReader on the front end to load in the file and then send it through mutations to the back end.

Try https://uppy.io/ to upload those large files.

Also - Netlify functions have a memory limit of 1MB and runtime of 10secs. So I don’t think you’lld be able to process 600MB of data in graphql or a function.

Personally, I’d consider an alternative approach to load 600MB of data like PostgreSQL: Documentation: 12: COPY etc.

Cheers!

RogerKikkas · February 28, 2021, 4:58pm

Thank you for the reply and the materials!

I have checked out the links you provided and will consult with the company.