Run 1000 requests so that only 10 runs at a time

With node.js I want to http.get a number of remote urls in a way that only 10 (or n) runs at a time.

I also want to retry a request if an exception occures locally (m times), but when the status code returns an error (5XX, 4XX, etc) the request counts as valid.

This is really hard for me to wrap my head around.

Problems:

  1. Cannot try-catch http.get as it is async.
  2. Need a way to retry a request on failure.
  3. I need some kind of semaphore that keeps track of the currently active request count.
  4. When all requests finished I want to get the list of all request urls and response status codes in a list which I want to sort/group/manipulate, so I need to wait for all requests to finish.

Seems like for every async problem using promises are recommended, but I end up nesting too many promises and it quickly becomes uncypherable.

Answers:

Answer

There are lots of ways to approach the 10 requests running at a time.

  1. Async Library - Use the async library with the .parallelLimit() method where you can specify the number of requests you want running at one time.

  2. Bluebird Promise Library - Use the Bluebird promise library and the request library to wrap your http.get() into something that can return a promise and then use Promise.map() with a concurrency option set to 10.

  3. Manually coded - Code your requests manually to start up 10 and then each time one completes, start another one.

In all cases, you will have to manually write some retry code and as with all retry code, you will have to very carefully decide which types of errors you retry, how soon you retry them, how much you backoff between retry attempts and when you eventually give up (all things you have not specified).

Other related answers:

How to make millions of parallel http requests from nodejs app?

Million requests, 10 at a time - manually coded example


My preferred method is with Bluebird and promises. Including retry and result collection in order, that could look something like this:

const request = require('request');
const Promise = require('bluebird');
const get = Promise.promisify(request.get);

let remoteUrls = [...];    // large array of URLs

const maxRetryCnt = 3;
const retryDelay = 500;

Promise.map(remoteUrls, function(url) {
    let retryCnt = 0;
    function run() {
        return get(url).then(function(result) {
            // do whatever you want with the result here
            return result;
        }).catch(function(err) {
            // decide what your retry strategy is here
            // catch all errors here so other URLs continue to execute
            if (err is of retry type && retryCnt < maxRetryCnt) {
                ++retryCnt;
                // try again after a short delay
                // chain onto previous promise so Promise.map() is still
                // respecting our concurrency value
                return Promise.delay(retryDelay).then(run);
            }
            // make value be null if no retries succeeded
            return null;
        });
    }
    return run();
}, {concurrency: 10}).then(function(allResults) {
     // everything done here and allResults contains results with null for err URLs
});
Answer

The simple way is to use async library, it has a .parallelLimit method that does exactly what you need.

Tags

Recent Questions

Top Questions

Home Tags Terms of Service Privacy Policy DMCA Contact Us

©2020 All rights reserved.