Thinking about patterns for services, GraphQL, Apollo, cells, etc

ryancwalsh · October 17, 2020, 7:18pm

Please forgive that I’m about to ask several naive questions in one post. (I think they’re related enough that maybe it would be easier for someone to answer in just one post than across multiple. If I should split them, just let me know.)

In /api/src/services/, I’m fetching third-party data as explained at Cookbook - Using a Third Party API : RedwoodJS Docs (see my real-world example at https://github.com/ryancwalsh/VoteQuickly/blob/4f307daebac6be55d8cf4b03fcc1ace9c68c5a45/api/src/services/scraper.ts#L87 ). But I’d prefer to create a table in the DB to cache results. I know how to run the migration to create a table. Perhaps I’ll make something like:

model CachedQuery {
  id        Int       @id @default(autoincrement())
  key       String
  value     String?
  expiresAt DateTime?
  createdAt DateTime  @default(now())
}

Once I have that table, how can I (within that service file in the back-end /api/src/services/scraper.ts) read from the table to see if a usable record exists (and if none does, how can I save a record to the DB after fetching fresh data from the 3rd party API)? I’m new to useQuery and useMutation and haven’t been able to figure out how to use those hooks outside of JSX components.
Similarly, on the front-end in /web/src/ how can I save a record to the DB after fetching fresh data from the 3rd party API? E.g. https://github.com/ryancwalsh/VoteQuickly/blob/4f307daebac6be55d8cf4b03fcc1ace9c68c5a45/web/src/components/Map.tsx#L62 is where I’m racking up a Google Maps bill because I’m not sure how to read from and write to my CachedQuery model given that this geocodeAddress function isn’t a component. I’d love to only ping the Google Maps API for new queries or where my cache is stale.
The way I’m using cells currently feels bad and wrong, and I think it’s because I’m misunderstanding something. See these 3 files: https://github.com/ryancwalsh/VoteQuickly/blob/4f307daebac6be55d8cf4b03fcc1ace9c68c5a45/web/src/pages/HomePage/HomePage.js#L21 and https://github.com/ryancwalsh/VoteQuickly/blob/4f307daebac6be55d8cf4b03fcc1ace9c68c5a45/web/src/components/GwinnettCell/GwinnettCell.js#L6 and https://github.com/ryancwalsh/VoteQuickly/blob/4f307daebac6be55d8cf4b03fcc1ace9c68c5a45/api/src/graphql/scraper.sdl.js#L12

Imagine I wanted to be able to ingest data from hundreds of unique API endpoints and transform all of their results to be uniform.

E.g. I could imagine running a really successful open source project where an individual could submit a new file into a special folder, and that file contained the logic for transforming data from a certain API into a standardized format expected by this project. And hundreds of people could add separate scraper files like this.

Ahh, in writing this question, before I even finished it, I have an idea for a better approach for the cells. What I’ll play with is:

The back-end would scan a directory of all of the scraper files that exist in the project, and somehow (TBD) the front-end would be aware of that array of file names, and that’s how it would know what options to show in the dropdown. And when the user makes a choice, the project’s single cell file would call a scrapePage() function and pass that particular pageName as the param. The project would have just one sdl file (with just one “type” to spec out the expected data format and just one Query, and that query would be scrapePage(pageName)).

And in my particular case, I’d also change my approach to have each special scraper file return not just the array of rows but also some extra meta data, such as geo location and name. (So far, I’ve been defining a new cell for each scraper, and then I had this extra meta data in the individual cell. This wasn’t a good approach.)

Well, even before responding I think you’ve already solved 33% of what I came here for this afternoon.

Thanks, Redwood community!

ryancwalsh · October 17, 2020, 9:33pm

Ok I actually do have 3 questions because I’m stuck even on #3. For now, ignore the fancy “scan-the-directory for files” and “have the front-end dropdown automatically know which scraper files are available in the back-end” ideas. Let’s say I have:

api\src\services\scraper.ts
api\src\lib\scrapers\GA\Cobb.ts
api\src\lib\scrapers\GA\Gwinnett.ts
api\src\lib\scrapers\FL\Miami-Dade.ts
[...etc, more scrapers]
web\src\components\ScraperCell\ScraperCell.js
web\src\pages\HomePage\HomePage.js

What I’m stuck on is: how can the dropdowns in HomePage (for USA state and county) pass those 2 params to ScraperCell so that it can pass those 2 params to the gql query that it finds in api\src\services\scraper.ts?

If I could do that, I imagine I’d have an import line for every scraper into api\src\services\scraper.ts, and its single function would derive a filename from the combination of the USA state and county params, and then call the function from whichever imported file would be relevant for those params (maybe like this).

dthyresson · October 17, 2020, 9:46pm

@ryancwalsh

Have a look at GraphQL queries

And query arguments and variables

You can pass a your voting location id to a gql endpoint that will resolve and call a service with the variables.

Once you have that value you could figure out what scraper to call or … check if you’ve saved and cached some recent data and query the database instead and if stake call the external api — well the web page.

dthyresson · October 17, 2020, 9:49pm

The onChange or some other event in the select would call a function that would call that query fetch. Or pass into the cell component and would do the fetch — not sure if you have one cell with the drop down and the rendered results or several discrete ones.

dthyresson · October 17, 2020, 10:10pm

If you want to map a key to a scraper, look at how a decoder is determined here in the Redwood auth/api packages:

github.com

redwoodjs/redwood/blob/0018d7e101877994717a91f900d4fb99e3647c41/packages/api/src/auth/decoders/index.ts#L42


        `The auth type "${type}" is not officially supported, we currently support: ${Object.keys(
          typesToDecoders
        ).join(', ')}`
      )

      console.warn(
        'Please ensure you have handlers for your custom auth in getCurrentUser in src/lib/auth.{js,ts}'
      )
    }
  }
  const decoder = typesToDecoders[type] || noop
  return decoder(token, req)
}

Based on an authType.

There is a map of types to decoders.

You could do something similar by gave a set of supported scrapers and look it up by the key — that is, the location value.

ryancwalsh · October 17, 2020, 10:13pm

Thanks @dthyresson. I’d already read https://redwoodjs.com/docs/cells#query and https://graphql.org/learn/queries/ and am already using queries like https://github.com/ryancwalsh/VoteQuickly/blob/4f307daebac6be55d8cf4b03fcc1ace9c68c5a45/web/src/components/GwinnettCell/GwinnettCell.js#L4

But all 3 of my questions still stand, even knowing all that.

Because for #1 and 2, I seem unable to call useQuery outside a component. Trying it always causes an error and the hint that it must be within a component.

And #3, I’m trying to find a more elegant approach than writing out ~100 cells (1 for each county) and 100 query definitions when they are really mostly the same thing at the higher level, and so the “switch” between the 100 different counties should happen “further down”.

The onChange of the dropdowns in HomePage can combine to affect a state variable (e.g. if someone chooses GA and Gwinnett, then I’d setState for a string ‘GA/Gwinnett’, following the back-end file-naming pattern). But then how can that state var be used in the gql in the cell?

Ohhh I’m finally noticing the <BlogPostsCell numberToShow={3} /> example I’ve been searching for but somehow kept overlooking at https://redwoodjs.com/docs/cells#query. So it seems like this magically takes care of it, and I think my #3 is solved:

export const QUERY = gql`
  query($numberToShow: Int!) {    posts(numberToShow: $numberToShow) {      id
      title
    }
  }
`

Thanks for your decoder mapping example. I’ll explore.

ryancwalsh · October 18, 2020, 4:23pm

Question #3 is solved like @dthyresson suggested: https://github.com/ryancwalsh/VoteQuickly/blob/2ad67a3ee878153b4af82f0581be84172f0711a6/api/src/services/scraper.ts#L4

Using just one cell that relies on multiple scraper files definitely feels better than my previous approach of using multiple cells. Thank you so much, David T!

If anyone has suggestions about questions #1 or 2 above, I’m super curious to learn. Thanks!

dthyresson · October 18, 2020, 5:19pm

The quick answer is that would not do a web query mutation - but rather, you save that data in a service.

So:

Web → cell → gql → get gwinnet

api → call service → get gwinner → looks for saved scrape for gwinnet → found? return, no? scrapes → returns scrape → saves to db → returns scrape to web client

Note lamdba functions have to return in under 10 seconds – that whole adventure has to complete in < 10.

There are ways to do this as a background job, but that’s a much bigger discussion.

Have a look at services in

// api/src/services/contacts/contacts.js

import { db } from 'src/lib/db'

export const contacts = () => {
  return db.contact.findMany()
}

export const createContact = ({ input }) => {
  return db.contact.create({ data: input })
}

That saves a contact with the attributes to the database. You’d save the result of your scraping.

If you want, you could store the entire scrape in a single JSON column, to parse out into individual attributes.

ryancwalsh · October 18, 2020, 9:29pm

Oh, it sounds like Apollo should be completely uninvolved, and I can just use Prisma directly for DB queries in back-end services. That probably should have been obvious. I think I’m starting to get it. I’ll explore that. Thanks!

ryancwalsh · October 23, 2020, 5:08pm

I got the back-end caching working a few days ago using Prisma directly instead of involving Apollo: https://github.com/ryancwalsh/VoteQuickly/commit/111364193c0c75858dd465d9df3507c3c67231ed

I’m still exploring how to solve my second question above in Thinking about patterns for services, GraphQL, Apollo, cells, etc : how to call Prisma functions from the front-end outside of a component (such as at https://github.com/ryancwalsh/VoteQuickly/blob/b099148/web/src/components/Map.ts#L64, which gets called from the Success function of the main ScraperCell ).

Also posted at https://stackoverflow.com/q/64469000/470749

If anyone here has any ideas, I’d love to hear.

KrisCoulson · October 23, 2020, 7:00pm

And #3, I’m trying to find a more elegant approach than writing out ~100 cells (1 for each county) and 100 query definitions when they are really mostly the same thing at the higher level, and so the “switch” between the 100 different counties should happen “further down”.

The onChange of the dropdowns in HomePage can combine to affect a state variable (e.g. if someone chooses GA and Gwinnett, then I’d setState for a string ‘GA/Gwinnett’, following the back-end file-naming pattern). But then how can that state var be used in the gql in the cell?

@ryancwalsh I tried to look through all of the things you linked still a little lost on what exactly you are trying to do but it seems you are having some trouble with cells. From looking at some of the stuff you are doing you might be better off dropping cells for what you are trying to do and use apollos useLazyQuery and simply implement the logic yourself. Looking at what’s going on in your different cells you are just updating the geolocation coordinates. I don’t think you need a bunch of separate components for each county. You can just create one that can take props that can update a single component. You should be able to import useLazyQuery directly from @redwoodjs/web.

dthyresson · October 23, 2020, 7:28pm

@ryancwalsh as @KrisCoulson noted

but it seems you are having some trouble with cells

Let’s take a step back and look at how “Redwood is Organized”

Redwood places both the frontend and backend code in a single monorepo.

/web contains the frontend and is served as static files through a CDN (and automatically code-split for you).

/api contains the backend serverless functions (a GraphQL API by default) that your frontend will call when it needs some dynamic data.

The “dotted line” is important here because your question of

tries to cross that line … but the only thing here that can cross the line are the blue arrows on the Client end.

But they can only cross it via the GraphQL API (or a HTTP fetch on function, but since we’re talking cells here, we can ignore this for simplicity).

So - " how to call Prisma functions from the front-end outside of a component" … is you don’t directly.

A cell calls a GraphQL API Query that in turn uses Prisma to query a database.

As I said in my prior post, if you want to persist the “scraped data”, you’d do that in a service.

thedavid · October 23, 2020, 9:03pm

^^ so a mutation from within a Cell component using a Service on the backend, correct?

ryancwalsh · October 24, 2020, 1:01am

Thanks @KrisCoulson. I agree; in an earlier comment I mentioned that I condensed down to just one cell rather than one per county/scraper.

So, that part is solved. I’ll catch up on the below comments now about how to use Apollo for the rest of what I’m trying. Thanks.

ryancwalsh · October 25, 2020, 10:22pm

@dthyresson @thedavid I think I got it working!

(Well, there might be some final kinks to work out, and I wouldn’t actually use this architecture because it would hammer the server with too many AJAX requests per page load, but I think I figured out the part that was mysterious to me.)

In the Success function of my ScraperCell, I now have:

return (
    <div id="geoCellsAndResultsTable">
      {waitTimesWithColors.map((waitTimeWithColor, index) => {
        const label = getMarkerLabel(index);
        const cachedQueryKey = getCachedQueryKeyFromAddress(waitTimeWithColor.address);
        return (
          <GeoCell
            waitTimeWithColor={waitTimeWithColor}
            cachedQueryKey={cachedQueryKey}
            expiresAtCutoff={expiresAtCutoff}
            map={map}
            geocoder={geocoder}
            label={label}
            key={waitTimeWithColor.address}
          />
        );
      })}
      <ResultsTable rows={waitTimesWithColors} />
    </div>
  );

github.com

ryancwalsh/VoteQuickly/blob/a3327d7923c340bfe3596bb8f0485911c5cba37e/web/src/components/ScraperCell/ScraperCell.js#L53


const geocoder = getGeocoder();
const nowMoment = dayjs.utc(); // https://day.js.org/docs/en/plugin/utc
const expiresAtCutoff = nowMoment.format(dbTimeFormatUtc);
return (
  <div id="geoCellsAndResultsTable">
    {waitTimesWithColors.map((waitTimeWithColor, index) => {
      const label = getMarkerLabel(index);
      const cachedQueryKey = getCachedQueryKeyFromAddress(waitTimeWithColor.address);
      console.log('ScraperCell map of GeoCells cachedQueryKey', cachedQueryKey, 'expiresAtCutoff', expiresAtCutoff);
      return (
        <GeoCell
          waitTimeWithColor={waitTimeWithColor}
          cachedQueryKey={cachedQueryKey}
          expiresAtCutoff={expiresAtCutoff}
          map={map}
          geocoder={geocoder}
          label={label}
          key={waitTimeWithColor.address}
        />
      );
    })}

So, what’s new is that the Success function of ScraperCell now returns not just the ResultsTable component but also a GeoCell for each result / row in the table (and the number of results is not known ahead of time).

What was unintuitive to me was the idea that I was required to use a JSX component (since Apollo would never let me query or mutate GraphQL outside of a JSX component), but I didn’t want the cell component to actually display anything… because all that it needs to do in its Success is call addMarkerAndInfoWindow, which affects the div of the existing Google Map (but doesn’t display anything where the GeoCell was actually located).

So what I’m doing is returning an empty string for each function of GeoCell (Success, Loading, Empty, and Failure). https://github.com/ryancwalsh/VoteQuickly/blob/a3327d7923c340bfe3596bb8f0485911c5cba37e/web/src/components/GeoCell/GeoCell.tsx#L74

That feels weird, but I think maybe this is what you all were suggesting.

At least now I’m able to ensure that my front-end first checks my database for cached results before fetching the front-end Google Maps API.