[HELP-7721] [Fiware-lab-help] SpainTenerife node - Incorrect disk ocupation values reported on Infographics - JIRA

Details

Type: extRequest
Status: Closed
Priority: Major
Resolution: Done
Fix Version/s: 2021
Component/s: FIWARE-LAB-HELP
Labels:
None

Sender Email:
roberto.gil@atos.net
External Participants:
carlos.fernandezc@atos.net
HD-Enabler:
Ops - Health
HD-Node:
Spain

Description

Hello,

Info shown on infographics for Spain Tenerife node seems incorrect. It shows 100% disk occupancy, which is wrong, and actually the numerical values shown on the clocks are incorrect. All our computing nodes are mirrors of each other, and each one is showing a different value.

In fact, we recently increased our nodes capacity to 400GB each, and this change is neither reflected on infographics.

Could you please check?

Thanks!

This e-mail and the documents attached are confidential and intended solely for the addressee; it may also be privileged. If you receive this e-mail in error, please notify the sender immediately and destroy it.
As its integrity cannot be secured on the Internet, the Atos group liability cannot be triggered for the message content. Although the sender endeavors to maintain a computer virus-free network, the sender does not warrant that this transmission is virus-free and will not be liable for any damages resulting from any virus transmitted.

Este mensaje y los ficheros adjuntos pueden contener información confidencial destinada solamente a la(s) persona(s) mencionadas anteriormente y pueden estar protegidos por secreto profesional.
Si usted recibe este correo electrónico por error, gracias por informar inmediatamente al remitente y destruir el mensaje.
Al no estar asegurada la integridad de este mensaje sobre la red, Atos no se hace responsable por su contenido. Su contenido no constituye ningún compromiso para el grupo Atos, salvo ratificación escrita por ambas partes.
Aunque se esfuerza al máximo por mantener su red libre de virus, el emisor no puede garantizar nada al respecto y no será responsable de cualesquiera daños que puedan resultar de una transmisión de virus.

Since January 1st, old domains won't be supported and messages sent to any domain different to @lists.fiware.org will be lost.
Please, send your messages using the new domain (Fiware-lab-help@lists.fiware.org) instead of the old one.
_______________________________________________
Fiware-lab-help mailing list
Fiware-lab-help@lists.fiware.org
https://lists.fiware.org/listinfo/fiware-lab-help
[Created via e-mail received from: "Gil Munoz, Roberto" <roberto.gil@atos.net>]

Activity

Descending order - Click to sort in ascending order

Hide

Permalink

Atos Spain Tenerife Node Support Team added a comment - 07/Apr/17 9:57 AM

Hello,

Yes, actually that is the case. /var/lib/nova/ partition hosts the machines. we have aound 170 GB on the partition. It seems to be the case the the value reported is the actual one, because there are thin-provisioned machines adding up to 420GB, which explains what is being reported to the monitoring.

I think this ticket can be closed, as the value is being reported by our backend and it is in fact correct. We will run an internal research to find out where the overprovisioning fits into the picture, as it doesn't seem to be configured in our system by default

Show

Atos Spain Tenerife Node Support Team added a comment - 07/Apr/17 9:57 AM Hello, Yes, actually that is the case. /var/lib/nova/ partition hosts the machines. we have aound 170 GB on the partition. It seems to be the case the the value reported is the actual one, because there are thin-provisioned machines adding up to 420GB, which explains what is being reported to the monitoring. I think this ticket can be closed, as the value is being reported by our backend and it is in fact correct. We will run an internal research to find out where the overprovisioning fits into the picture, as it doesn't seem to be configured in our system by default

Hide

Permalink

José Ignacio Carretero Guarde added a comment - 06/Apr/17 7:00 PM

Any responses? Can we close the ticket?

Show

José Ignacio Carretero Guarde added a comment - 06/Apr/17 7:00 PM Any responses? Can we close the ticket?

Hide

Permalink

José Ignacio Carretero Guarde added a comment - 28/Mar/17 4:58 PM

It would be nice if you provided the output of "df -f" command for any of your compute nodes.

Show

José Ignacio Carretero Guarde added a comment - 28/Mar/17 4:58 PM It would be nice if you provided the output of "df -f" command for any of your compute nodes.

Hide

Permalink

Daniele Santoro added a comment - 28/Mar/17 4:35 PM

on /var/lib/nova/instances if I'm not wrong. Can you confirm Atos Spain Tenerife Node Support Team ?

Show

Daniele Santoro added a comment - 28/Mar/17 4:35 PM on /var/lib/nova/instances if I'm not wrong. Can you confirm Atos Spain Tenerife Node Support Team ?

Hide

Permalink

José Ignacio Carretero Guarde added a comment - 28/Mar/17 4:29 PM

Where are those 400Gb disk mounted in the compute-nodes?

Show

José Ignacio Carretero Guarde added a comment - 28/Mar/17 4:29 PM Where are those 400Gb disk mounted in the compute-nodes?

Hide

Permalink

Daniele Santoro added a comment - 24/Mar/17 11:38 AM

Ok I agree, anyway I just need the confirmation that the numbers about disk space from the command "nova host-describe" we analysed are dependant from that value. So maybe you could:

1) Launch the command "nova host-describe NODEx"
2) Change the replication factor value and restart nova
3) Launch again the command "nova host-describe NODEx"

see if those numbers changes and give me confirmation.

Then we can close this ticket and I will plan the implementation of a new feature in the monitoring system in order to support this disk replication factor. Once ready we could test and tune the value for your region.

Sounds good ?

Show

Daniele Santoro added a comment - 24/Mar/17 11:38 AM Ok I agree, anyway I just need the confirmation that the numbers about disk space from the command "nova host-describe" we analysed are dependant from that value. So maybe you could: 1) Launch the command "nova host-describe NODEx" 2) Change the replication factor value and restart nova 3) Launch again the command "nova host-describe NODEx" see if those numbers changes and give me confirmation. Then we can close this ticket and I will plan the implementation of a new feature in the monitoring system in order to support this disk replication factor. Once ready we could test and tune the value for your region. Sounds good ?

Hide

Permalink

Atos Spain Tenerife Node Support Team added a comment - 24/Mar/17 9:54 AM

Hi, Daniele

Yes, I have been checking this point. In our nova.conf configuration files, I haven't found the disk_allocation_ratio parameter, which makes me think either default value is not 1, or the problem lies somewhere else.

In order to push this forward, this is my proposal: I am going to force 1,5 as disk_allocation_ratio on the controllers, and then you can patch the monitoring observing that actually the value could be different than 1. That way we can check if the values match afterwards

Show

Atos Spain Tenerife Node Support Team added a comment - 24/Mar/17 9:54 AM Hi, Daniele Yes, I have been checking this point. In our nova.conf configuration files, I haven't found the disk_allocation_ratio parameter, which makes me think either default value is not 1, or the problem lies somewhere else. In order to push this forward, this is my proposal: I am going to force 1,5 as disk_allocation_ratio on the controllers, and then you can patch the monitoring observing that actually the value could be different than 1. That way we can check if the values match afterwards

Hide

Permalink

Daniele Santoro added a comment - 24/Mar/17 9:24 AM

I did some inspection on the monitoring API code and it confirms that we do not consider “disk_allocation_ratio”, or better, we consider 1 as the default. Therefore if you have a different value set for your region the disk space reported by the monitoring API is wrong.

Atos Spain Tenerife Node Support Team: Have you had the chance to check which is the value of that configuration parameter on your region ? As specified in table 3.52 here: https://goo.gl/5WRrkf it should be present in the nova.conf file on your controllers.

I need this information in order to confirm the cause of the issue, then I can start the development of a new feature in the monitoring system in order to support this region customisation.

Show

Daniele Santoro added a comment - 24/Mar/17 9:24 AM I did some inspection on the monitoring API code and it confirms that we do not consider “disk_allocation_ratio”, or better, we consider 1 as the default. Therefore if you have a different value set for your region the disk space reported by the monitoring API is wrong. Atos Spain Tenerife Node Support Team : Have you had the chance to check which is the value of that configuration parameter on your region ? As specified in table 3.52 here: https://goo.gl/5WRrkf it should be present in the nova.conf file on your controllers. I need this information in order to confirm the cause of the issue, then I can start the development of a new feature in the monitoring system in order to support this region customisation.

Hide

Permalink

Daniele Santoro added a comment - 14/Mar/17 3:43 PM

This issue is related with OPS-1445. Status can be tracked also from it.

Show

Daniele Santoro added a comment - 14/Mar/17 3:43 PM This issue is related with OPS-1445. Status can be tracked also from it.

Hide

Permalink

Daniele Santoro added a comment - 10/Feb/17 5:46 PM

Dear, can we try to go ahead on this ticket ? We have planned to solve it on the next Sprint 6.2.2. Please provide required information.

Show

Daniele Santoro added a comment - 10/Feb/17 5:46 PM Dear, can we try to go ahead on this ticket ? We have planned to solve it on the next Sprint 6.2.2. Please provide required information.

Hide

Permalink

Daniele Santoro added a comment - 16/Jan/17 5:00 PM

Dear operators, any news about my last request sent via mail ?

Show

Daniele Santoro added a comment - 16/Jan/17 5:00 PM Dear operators, any news about my last request sent via mail ?

Hide

Permalink

Daniele Santoro added a comment - 13/Jan/17 10:52 AM

Monitoring chain has been fixed and task is back to progress state, I'm waiting detailed information from the node operators.

Show

Daniele Santoro added a comment - 13/Jan/17 10:52 AM Monitoring chain has been fixed and task is back to progress state, I'm waiting detailed information from the node operators.

Hide

Permalink

Daniele Santoro added a comment - 21/Dec/16 5:03 PM

We currently have a problem on the monitoring chain which prevents Infographics to show latest monitored data from the regions. For this reason I cannot verify your problem right now but I will take a look once the issue with the monitoring system will be resolved.

Show

Daniele Santoro added a comment - 21/Dec/16 5:03 PM We currently have a problem on the monitoring chain which prevents Infographics to show latest monitored data from the regions. For this reason I cannot verify your problem right now but I will take a look once the issue with the monitoring system will be resolved.

Hide

Permalink

Daniele Santoro added a comment - 29/Nov/16 9:52 AM

Dear,

I will take a look at this issue and I will update you as soon as possible.

Daniele

Show

Daniele Santoro added a comment - 29/Nov/16 9:52 AM Dear, I will take a look at this issue and I will update you as soon as possible. Daniele

Hide

Permalink

ilknur chulani added a comment - 23/Nov/16 1:04 PM

Hi Daniele,

It seems Pablo forgot to assign the ticket to you. Could you kindly take a look? We would really like to sort this issue out soon.

Thanks in advance for your help.

Regards,

ilknur

Show

ilknur chulani added a comment - 23/Nov/16 1:04 PM Hi Daniele, It seems Pablo forgot to assign the ticket to you. Could you kindly take a look? We would really like to sort this issue out soon. Thanks in advance for your help. Regards, ilknur

Hide

Permalink

Pablo Rodríguez Archilla added a comment - 17/Nov/16 1:25 PM - edited

Well,

This issue falls between 3 different components (and maintainers): Kasia Di Meo would be the right person for visualization issues, Daniele Santoro for wrong aggregated information at Monitoring API, and me for the rest of components of Monitoring.

I think that, in this case, we should assign the ticket to Daniele and he will contact the rest in order to solve this known problem which involves different components.

Regards.

Show

Pablo Rodríguez Archilla added a comment - 17/Nov/16 1:25 PM - edited Well, This issue falls between 3 different components (and maintainers): Kasia Di Meo would be the right person for visualization issues, Daniele Santoro for wrong aggregated information at Monitoring API, and me for the rest of components of Monitoring. I think that, in this case, we should assign the ticket to Daniele and he will contact the rest in order to solve this known problem which involves different components. Regards.

Hide

Permalink

ilknur chulani added a comment - 17/Nov/16 1:17 PM

Dear Pablo,

I am assigning this issue to you as I believe you are the maintainer of the Infographic monitoring tools. Could you kindly take a look if there is a defect in the tool, or a missing/incorrect configuration on our part, etc?

Thanks,

ilknur

Show

ilknur chulani added a comment - 17/Nov/16 1:17 PM Dear Pablo, I am assigning this issue to you as I believe you are the maintainer of the Infographic monitoring tools. Could you kindly take a look if there is a defect in the tool, or a missing/incorrect configuration on our part, etc? Thanks, ilknur

Hide

Permalink

Budapest Node Helpdesk added a comment - 17/Nov/16 1:07 PM

The issue has been emailed:

Time sent: 17/Nov/16 1:07 PM
To: roberto.gil@atos.net
with subject: *(~~HELP-7721~~) [Fiware-lab-help] SpainTenerife node - Incorrect disk ocupation values reported on Infographics *

Dear Roberto,

I forwarded your problem to the Spain Tenerife Support Team.

Best Regards,
Gergő

Show

Budapest Node Helpdesk added a comment - 17/Nov/16 1:07 PM The issue has been emailed: Time sent: 17/Nov/16 1:07 PM To: roberto.gil@atos.net with subject: *( HELP-7721 ) [Fiware-lab-help] SpainTenerife node - Incorrect disk ocupation values reported on Infographics * Dear Roberto, I forwarded your problem to the Spain Tenerife Support Team. Best Regards, Gergő

People

Assignee:

Spain Node Support Team

Reporter:

FW External User

Votes:

0 Vote for this issue

Watchers:

7 Start watching this issue

Dates

Created:

17/Nov/16 11:52 AM

Updated:

27/May/21 10:56 AM

Resolved:

07/Apr/17 10:05 AM

Agile

View on Board