Creating a searchable database from uploaded pdfs

What up folks,

I’m a self taught coder whose been using rw for a side project since I heard about it on the React Podcast. I’m currently stuck trying to implement the following, and thought I would seek out the rw community as they have been awesome on the discord so far with my dumb noob questions.

I want an authenticated user to be able to search by word or document title through almost a 100 uploaded pdfs that are in a AWS S3 bucket (these are reference docs, and all users will be able to have read access to view all uploaded pdfs). I already have created the file upload to AWS through a tutorial someone posted here ( Thanks [Tobbe] for the writeup!). If anyone has any guidance on OCR or lessons learned trying to implement something similar I’m all ears. Thanks for reading!

@d0za hi!

Your best bet to implement that search is to use Algolia.

Searching pdf isn’t like searching text, but they have some tips on indexing and then searching longer form content:

And ways to extract the content from PDFs: