Node.js Heap Overflow error

I am getting this error on AWS EC2 after 1-2 days of running this code

ERROR

<--- Last few GCs --->
st[10805:0x41cdff0]  7130379 ms: Mark-sweep 33.2 (78.7) -> 21.1 (75.8) MB, 13.8 / 0.1 ms  (+ 23.1 ms in 23 steps since start of marking, biggest step 4.3 ms, walltime since start of marking 160 ms) final$

<--- JS stacktrace --->
Cannot get stack trace in GC.
FATAL ERROR: Scavenger: promoting marked
Allocation failed - process out of memory
1: node::Abort() [node]
2: 0x12b288c [node]
3: v8::Utils::ReportOOMFailure(char const*, bool) [node]
4: v8::internal::V8::FatalProcessOutOfMemory(char const*, bool) [node]
5: 0xa96bfb [node]
6: void v8::internal::ScavengingVisitor<(v8::internal::MarksHandling)0, 
(v8::internal::PromotionMode)0, (v8::internal::LoggingAndProfiling)1>::EvacuateObject<(v8::internal::ScavengingVisitor<(v8::intern$
 7: v8::internal::Scavenger::ScavengeObject(v8::internal::HeapObject**, v8::internal::HeapObject*) [node]
 8: v8::internal::Heap::IteratePromotedObjectPointers(v8::internal::HeapObject*, unsigned char*, unsigned char*, bool, void (*)(v8::internal::HeapObject**, v8::internal::HeapObject*)) [node]
 9: void v8::internal::BodyDescriptorBase::IterateBodyImpl<v8::internal::ObjectVisitor>(v8::internal::HeapObject*, int, int, v8::internal::ObjectVisitor*) [node]
10: void v8::internal::BodyDescriptorApply<v8::internal::CallIterateBody, void, v8::internal::HeapObject*, int, v8::internal::ObjectVisitor*>(v8::internal::InstanceType, v8::internal::HeapObject*, int, v$
11: v8::internal::Heap::DoScavenge(v8::internal::ObjectVisitor*, unsigned char*, v8::internal::PromotionMode) [node]
12: v8::internal::Heap::Scavenge() [node]
13: v8::internal::Heap::PerformGarbageCollection(v8::internal::GarbageCollector, v8::GCCallbackFlags) [node]
14: v8::internal::Heap::CollectGarbage(v8::internal::GarbageCollector, v8::internal::GarbageCollectionReason, char const*, v8::GCCallbackFlags) [node]
15: v8::internal::Factory::NewRawTwoByteString(int, v8::internal::PretenureFlag) [node]
16: v8::internal::Factory::NewStringFromUtf8(v8::internal::Vector<char const>, v8::internal::PretenureFlag) [node]
17: v8::String::NewFromUtf8(v8::Isolate*, char const*, v8::String::NewStringType, int) [node]
18: node::StringBytes::Encode(v8::Isolate*, char const*, unsigned long, node::encoding) [node]
19: void node::Buffer::StringSlice<(node::encoding)1>(v8::FunctionCallbackInfo<v8::Value> const&) [node]
20: 0x33c699f18dcf

My Main function is a async while loop which looks like this, this is a controller function for an express route

function controller(cb) {
  return new Promise((resolve, reject) => {
    let killed = false;
    (async() => {
      let isEmpty = false;
      while (!killed && !isEmpty) {
        const code = await processBatch();
        if (code === EMPTY_QUEUE) {
          isEmpty = true;
          console.log('ss');
          resolve(false);
        }
      }
    })();
    cb()
      .then((state) => killed = state);
  });
}

Here, processBatch() can take around 10 sec to resolve the promise

NOTE: processBatch will never return EMPTY_QUEUE and killed is never set to true by callback

Taking this into account, can someone please tell me why this controller function is consuming so much memory after some time, am i doing something which is stopping node to garbage collect data or something like that ?

-- UPDATE --

this is the router code which calls controller function, and ensures no more than one controller is working at a time

const query = require('../controllers/fetchContent').query;
const controller = require('../../storage/controllers/index').controller;

let isFetching = false;
let killed = false;

function killSwitch () {
 return new Promise((resolve, reject) => {
     setInterval(() => {
        if(killed) {
            resolve(killed);
        }
    }, 10000);
 })
}
module.exports = (app) => {
 app.get('/api', (req, res) => {
    res.setHeader('Content-Type', 'application/json');
    res.json({"statusCode" : 200, "body" : "Hey"})
});
 app.post('/', (req, res) => {
    if(!killed) {
        if (!isFetching) {
            isFetching = true;
            controller(killSwitch)
                    .then((response) => {
                        isFetching = response.isFetching;
                    });
            res.send({
                success: true,
                message: 'Okay I will extract send the contents to the database'
            })
        } else {
            res.send({
                success: true,
                message: 'Already Fetching'
            })
        }
    } else {
        res.send({
            success: false,
            message: 'In killed State, start to continue'
        })
    }
});
 app.post('/kill', (req, res) => {
    killed = true;
    isFetching = false;
    res.send(200, 'Okay I have stopped the fetcher process')
});
 app.post('/alive', (req, res) => {
    killed = false;
    res.send({
        success: true,
        message: 'Now New req to / will be entertained'
    })
 });
  app.post('/api/fetch', query);
};

-- UPDATE 2 --

this, is the processBatch() function, it's role is to get data from Amazon SQS and after processing that data send it to another Amazon SQS and notify subscribers via Amazon SNS.

async function processBatch() {
let data = await getDataFromQueue();// Wait for the promise returned after messages are retrieved from the Queue.
let listOfReceipt = [];
if (q.length() > 50 ) {
  // if queue length is more than 50 then wait for queue to process previous data ( done in order to put a max cap on queue size )   
    await sleep(400);
    console.log(q.length());
    return CLEAN_EXIT;
}
//Also get the ReceiptHandles for those messages. (To be used for deletion later on)

if (!data.Messages || !data.Messages.length) {
    pushSNS(null, true);
    pushDelete(null, true);
    return EMPTY_QUEUE;
}
try {
    for (let i = 0; i < data.Messages.length; i++) {
        data.Messages[i].Body = JSON.parse(data.Messages[i].Body);
        const URL = data.Messages[i].Body.url;
        const identifier = data.Messages[i].Body.identifier;
        listOfReceipt.push(data.Messages[i].ReceiptHandle);// get the ReceiptHandle out of the message.
        q.push(URL, async (err, html) => {
            if (err) {
                console.log(err);
            } else {
                await sendDataToQueue({url: URL, content: html, identifier});
                pushDelete(data.Messages[i].ReceiptHandle);
                pushSNS();
            }
        });
    }
} catch (e) {
    console.log(e);
    pushSNS(null, true);
    pushDelete(null, true);
    return CLEAN_EXIT;
// simply ignore any error and delete that msg
 }
 return CLEAN_EXIT;
}

Here q is Async.queue and it's worker function's i.e extractContent role is to fetch the content of provided URL.

There are the helper function for this module.

const q = async.queue((URL, cb) => {
extractContent(URL, array)
        .then((html) => {
            cb(null,html);
        })
        .catch((e) => {
            cb(e);
        })
 }, concurrency);


function internalQueue(cb) {
    let arr = [];
    return function (message, flag) {
       arr.push(message);
       if(arr.length >= 10 || flag) {
          arr = [];
          cb();
       }
    }
}

function sleep (delay) {
 return new Promise ((resolve, reject) => {
    setTimeout(() => resolve(), delay)
  })
}
// this is done in order to do things in a batch, this reduces cost
let pushSNS = internalQueue(sendDataToSNS);
let pushDelete = internalQueue(deleteDataFromSQS);

Answers:

Answer

First of all your controller function returns a Promise which never resolves according to your statement that processBatch will never return EMPTY_QUEUE. I assume that you store returned Promises somewhere and each of them consumes memory.

Also each time you call controller function, it creates a new loop that calls processBatch infinitely. So if controller is a controller function for an express route, then each time someone requests that route, you create a new loop that infinitely calls processBatch. I bet that this is not a desired behavior and it definitely blocks lots of memory.

Updated due to new details:

Currently if someone will POST on /kill and then POST on '/alive' without a delay, she will be able to POST on / and start another loop in controller because processBatch can take around 10 sec to resolve the promise. This way if someone will make several repetitive POSTs to /kill -> /alive -> /, she will effectively DoS your app. Probably this is what happening.

Another update

This code q.push(URL, async (err, html) => { starts a new query and attaches a callback that should be called after the query is fulfilled. The q counter decreases before the callback is called. But the callback is asynchronous (async) and it does another query await sendDataToQueue({url: URL, content: html, identifier});.

As you can see if sendDataToQueue executes slower than q, then callbacks accumulate and consume memory.

Tags

Recent Questions

Top Questions

Home Tags Terms of Service Privacy Policy DMCA Contact Us

©2020 All rights reserved.