/ node

Recursive Promises in Node

Recursive functions are extremely useful for many things in JavaScript. In React, you might want to traverse through all the children of a higher order component, even nested ones. There's many valid use-cases for these functions.

Last week I realized that I needed to recursively call a promise and eventually resolve at an unknown point. Why? In this case, I had to build a website scraper... I know, it's the best (not). I needed a way to keep clicking the next page link on a website until it no longer existed, and then resolve all URL's found on those pages.

Initially I was stuck because I couldn't find much help online. After a couple times of trial and error, I was overthinking it! Let's jump into the code:

const renderURLs = (site, url, existingData = []) => 
  new Promise(async resolve => {
    let response = await renderPage(site, url);
    let nextURL = getNextUrl(response.$);

    if (!nextURL) return resolve([...response.data, ...existingData]);
    else renderURLs(site, nextURL, [...response.data, ...existingData]);
});

I thought this would work.

First I call renderPage which returns the page and a function similar to jQuery (I know... believe me I would rather use vanilla js, but this scraper only exposes the page object like this!).

I pass it to a function that checks the DOM for a next page link, and returns the url, or false if it does not exist.

If there is no url, resolve the current links and the last set of links.

If there is a link, call renderURL's again and get more links, this time passing the current data.

The only thing I messed up on, was that you need to keep resolving the next promise each time! So if we change that last else statement to:

else resolve(
  renderURLs(site, nextURL, [...response.data, ...existingData])
);

The whole thing works! If you await on renderURL's it will wait until all of the recursive calls have been resolved.

We can take this a step further and remove new Promise:

const renderURLs = async (site, url, existingData = []) => {
    let response = await renderPage(site, url);
    let nextURL = getNextUrl(response.$);

    if (!nextURL) return [...response.data, ...existingData];
    else return await renderURLs(site, nextURL, [
      ...response.data, 
      ...existingData
    ]);
};

Doing an async function is the same as calling new Promise. Thanks to @chrizworks for making me realize using that here was unneeded! I decided to keep the top example because it helps you understand that async await is the same thing.

Conclusion

I hope this helped someone else understand the power of recursive promises in JS. This concept may seem extremely simple to some, but I hope others won't get stuck like I did. If you know a better way to do this, or if you have any comments or concerns, let me know below or on Twitter @zachcodes.