Uploaded image for project: 'Help-Desk'
  1. Help-Desk
  2. HELP-15954

[fiware-askbot] I have a problem/question regarding the init procedure

    Details

    • Type: Monitor
    • Status: Closed
    • Priority: Major
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: 2021
    • Component/s: None

      Description

      Created question in FIWARE Q/A platform on 19-06-2019 at 21:06
      Please, ANSWER this question AT https://ask.fiware.org/question/1085/i-have-a-problemquestion-regarding-the-init-procedure/

      Question:
      I have a problem/question regarding the init procedure

      Description:
      Hello all,

      I'm trying to deploy the Orion ContextBroker on a Openshift/OKD(Kubernetes) cluster and I'm having a problem with its deployment regarding the initialization time.

      I'm using the 2.2.0 release tag for the ContextBroker with mongoDB 3.2.0. The startup args for Orion are:
      "-ipv4 -reqPoolSize 100 -notificationMode threadpool:10000:50 -statNotifQueue -statCounters -statSemWait -statTiming -relogAlarms -httpTimeout 100000"

      The initialization appears to be somewhat inconsistent regarding the time required for the app to become available.
      Sometimes the deployment runs "smoothly" and sometimes the app fails to start (in a reasonable time interval).

      The initialization seems to freeze at a certain point which appears in the logs as [1]. The actual service isn't started (lsof -i -n -P doesn't return any processes using port 1026).
      I use standard health-checks which, basically, do a "curl localhost:1026/version". I've tried modifying the timeouts and also the delay time from which the probe fires. Not even with 360sec (6 minutes) delay do I reach consistent deployments!

      I've tested with different resource allocation and this doesn't seem to be the problem.

      Also by checking the logs I see some "odd" intervals in the initialization procedure. I have some excerpts at the end of the message, [2], where I can see the last steps of the init procedure being executed(or, at least, logged) at precisely 1 minute intervals.

      The problem is that once the readiness health check fails, the deployment fails as well. Orion seems to use a lot of RAM which does not get released even if the notification load disappears, from what I saw. The recommendation would be to restart the process, which, in my case, can be automatically handled if I set an upper memory limit for the container. So the initialization process comes again in question...also for the auto-scaling mechanism, etc.

      Any hints towards how to solve this problem would be much appreciated!

      Thanks,
      Dan

      ==========================

      [1] - last 10-12 lines from ContextBroker's log; DEBUG -t 0-255

      =============================
      time=Wednesday 19 Jun 16:50:35 2019.407Z | lvl=DEBUG | corr=N/A | trans=N/A | from=N/A | srv=N/A | subsrv=N/A | comp=Orion | op=connectionOperations.cpp[802]:getWriteConcern | msg=getWriteConcern()
      time=Wednesday 19 Jun 16:50:35 2019.407Z | lvl=INFO | corr=N/A | trans=N/A | from=N/A | srv=N/A | subsrv=N/A | comp=Orion | op=connectionOperations.cpp[807]:getWriteConcern | msg=Database Operation Successful (getWriteConcern)
      time=Wednesday 19 Jun 16:50:35 2019.407Z | lvl=DEBUG | corr=N/A | trans=N/A | from=N/A | srv=N/A | subsrv=N/A | comp=Orion | op=mongoConnectionPool.cpp[240]:mongoConnect | msg=Active DB Write Concern mode: 1
      time=Wednesday 19 Jun 16:50:35 2019.431Z | lvl=DEBUG | corr=N/A | trans=N/A | from=N/A | srv=N/A | subsrv=N/A | comp=Orion | op=connectionOperations.cpp[691]:runCollectionCommand | msg=runCommand() in 'admin' collection: '

      { buildinfo: 1 }

      '
      time=Wednesday 19 Jun 16:50:35 ... (more)

        Activity

        fla Fernando Lopez made changes -
        Fix Version/s 2021 [ 12600 ]
        jicg José Ignacio Carretero Guarde made changes -
        Resolution Done [ 10000 ]
        Status Answered [ 10104 ] Closed [ 6 ]
        jicg José Ignacio Carretero Guarde made changes -
        Status In Progress [ 3 ] Answered [ 10104 ]
        backlogmanager Backlog Manager made changes -
        Status Open [ 1 ] In Progress [ 3 ]
        fla Fernando Lopez made changes -
        Assignee José Ignacio Carretero Guarde [ jicg ]
        fla Fernando Lopez made changes -
        Field Original Value New Value
        HD-Enabler Orion [ 10875 ]
        Description
        Created question in FIWARE Q/A platform on 19-06-2019 at 21:06
        {color: red}Please, ANSWER this question AT{color} https://ask.fiware.org/question/1085/i-have-a-problemquestion-regarding-the-init-procedure/


        +Question:+
        I have a problem/question regarding the init procedure

        +Description:+
        Hello all,

        I'm trying to deploy the Orion ContextBroker on a Openshift/OKD(Kubernetes) cluster and I'm having a problem with its deployment regarding the initialization time.

        I'm using the 2.2.0 release tag for the ContextBroker with mongoDB 3.2.0. The startup args for Orion are:
        "-ipv4 -reqPoolSize 100 -notificationMode threadpool:10000:50 -statNotifQueue -statCounters -statSemWait -statTiming -relogAlarms -httpTimeout 100000"

        The initialization appears to be somewhat inconsistent regarding the time required for the app to become available.
        Sometimes the deployment runs "smoothly" and sometimes the app fails to start (in a reasonable time interval).

        The initialization seems to freeze at a certain point which appears in the logs as [1]. The actual service isn't started (lsof -i -n -P doesn't return any processes using port 1026).
        I use standard health-checks which, basically, do a "curl localhost:1026/version". I've tried modifying the timeouts and also the delay time from which the probe fires. Not even with 360sec (6 minutes) delay do I reach consistent deployments!

        I've tested with different resource allocation and this doesn't seem to be the problem.

        Also by checking the logs I see some "odd" intervals in the initialization procedure. I have some excerpts at the end of the message, [2], where I can see the last steps of the init procedure being executed(or, at least, logged) at precisely 1 minute intervals.

        The problem is that once the readiness health check fails, the deployment fails as well. Orion seems to use a lot of RAM which does not get released even if the notification load disappears, from what I saw. The recommendation would be to restart the process, which, in my case, can be automatically handled if I set an upper memory limit for the container. So the initialization process comes again in question...also for the auto-scaling mechanism, etc.

        Any hints towards how to solve this problem would be much appreciated!

        Thanks,
        Dan

        ==========================

        [1] - last 10-12 lines from ContextBroker's log; DEBUG -t 0-255

        =============================
        time=Wednesday 19 Jun 16:50:35 2019.407Z | lvl=DEBUG | corr=N/A | trans=N/A | from=N/A | srv=N/A | subsrv=N/A | comp=Orion | op=connectionOperations.cpp[802]:getWriteConcern | msg=getWriteConcern()
        time=Wednesday 19 Jun 16:50:35 2019.407Z | lvl=INFO | corr=N/A | trans=N/A | from=N/A | srv=N/A | subsrv=N/A | comp=Orion | op=connectionOperations.cpp[807]:getWriteConcern | msg=Database Operation Successful (getWriteConcern)
        time=Wednesday 19 Jun 16:50:35 2019.407Z | lvl=DEBUG | corr=N/A | trans=N/A | from=N/A | srv=N/A | subsrv=N/A | comp=Orion | op=mongoConnectionPool.cpp[240]:mongoConnect | msg=Active DB Write Concern mode: 1
        time=Wednesday 19 Jun 16:50:35 2019.431Z | lvl=DEBUG | corr=N/A | trans=N/A | from=N/A | srv=N/A | subsrv=N/A | comp=Orion | op=connectionOperations.cpp[691]:runCollectionCommand | msg=runCommand() in 'admin' collection: '{ buildinfo: 1 }'
        time=Wednesday 19 Jun 16:50:35 ... (more)
        Created question in FIWARE Q/A platform on 19-06-2019 at 21:06
        {color: red}Please, ANSWER this question AT{color} https://ask.fiware.org/question/1085/i-have-a-problemquestion-regarding-the-init-procedure/


        +Question:+
        I have a problem/question regarding the init procedure

        +Description:+
        Hello all,

        I'm trying to deploy the Orion ContextBroker on a Openshift/OKD(Kubernetes) cluster and I'm having a problem with its deployment regarding the initialization time.

        I'm using the 2.2.0 release tag for the ContextBroker with mongoDB 3.2.0. The startup args for Orion are:
        "-ipv4 -reqPoolSize 100 -notificationMode threadpool:10000:50 -statNotifQueue -statCounters -statSemWait -statTiming -relogAlarms -httpTimeout 100000"

        The initialization appears to be somewhat inconsistent regarding the time required for the app to become available.
        Sometimes the deployment runs "smoothly" and sometimes the app fails to start (in a reasonable time interval).

        The initialization seems to freeze at a certain point which appears in the logs as [1]. The actual service isn't started (lsof -i -n -P doesn't return any processes using port 1026).
        I use standard health-checks which, basically, do a "curl localhost:1026/version". I've tried modifying the timeouts and also the delay time from which the probe fires. Not even with 360sec (6 minutes) delay do I reach consistent deployments!

        I've tested with different resource allocation and this doesn't seem to be the problem.

        Also by checking the logs I see some "odd" intervals in the initialization procedure. I have some excerpts at the end of the message, [2], where I can see the last steps of the init procedure being executed(or, at least, logged) at precisely 1 minute intervals.

        The problem is that once the readiness health check fails, the deployment fails as well. Orion seems to use a lot of RAM which does not get released even if the notification load disappears, from what I saw. The recommendation would be to restart the process, which, in my case, can be automatically handled if I set an upper memory limit for the container. So the initialization process comes again in question...also for the auto-scaling mechanism, etc.

        Any hints towards how to solve this problem would be much appreciated!

        Thanks,
        Dan

        ==========================

        [1] - last 10-12 lines from ContextBroker's log; DEBUG -t 0-255

        =============================
        time=Wednesday 19 Jun 16:50:35 2019.407Z | lvl=DEBUG | corr=N/A | trans=N/A | from=N/A | srv=N/A | subsrv=N/A | comp=Orion | op=connectionOperations.cpp[802]:getWriteConcern | msg=getWriteConcern()
        time=Wednesday 19 Jun 16:50:35 2019.407Z | lvl=INFO | corr=N/A | trans=N/A | from=N/A | srv=N/A | subsrv=N/A | comp=Orion | op=connectionOperations.cpp[807]:getWriteConcern | msg=Database Operation Successful (getWriteConcern)
        time=Wednesday 19 Jun 16:50:35 2019.407Z | lvl=DEBUG | corr=N/A | trans=N/A | from=N/A | srv=N/A | subsrv=N/A | comp=Orion | op=mongoConnectionPool.cpp[240]:mongoConnect | msg=Active DB Write Concern mode: 1
        time=Wednesday 19 Jun 16:50:35 2019.431Z | lvl=DEBUG | corr=N/A | trans=N/A | from=N/A | srv=N/A | subsrv=N/A | comp=Orion | op=connectionOperations.cpp[691]:runCollectionCommand | msg=runCommand() in 'admin' collection: '{ buildinfo: 1 }'
        time=Wednesday 19 Jun 16:50:35 ... (more)
        backlogmanager Backlog Manager created issue -

          People

          • Assignee:
            jicg José Ignacio Carretero Guarde
            Reporter:
            backlogmanager Backlog Manager
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: