[visit-developers] solved with BG_SYSIODPOSIXMODE=1: VisIt on BlueGene/Q - communication problems

Göbbert, Jens Henrik goebbert at vr.rwth-aachen.de
Fri Jun 5 05:32:30 EDT 2015


Hi Brad,
Hi BlueGene Developer,

we still have some troubles with VisIt on our BlueGene.
We tracked it down to the routine "gethostbyname()", which cannot resolve the name of the front-node and never returns.

It is called here
http://fossies.org/dox/visit2.9.0/ParentProcess_8C_source.html#l00804

On the BlueGene/Q JUQUEEN compute nodes cannot resolve hostnames currently.
But you had VisIt up and running on your BlueGene/Q - by any chance can you help me to contact the system admin who might be able to give me some hints?

Best
Jens


 -----------------------------------------------
 Dipl.-Ing. Jens Henrik Göbbert
 Cross-Sectional Group „Immersive Visualization“ (CSG ImmVis)
 Jülich Aachen Research Alliance, High Performance Computing (JARA-HPC)

 IT Center, RWTH Aachen University, Virtual Reality Group (VRG)
 Kopernikusstraße 6, 52074 Aachen, Germany

 Institute for Advanced Simulation, Jülich Supercomputing Centre (JSC)
 Wilhelm-Johnen-Straße, 52425 Jülich, Germany

 Phone (VRG): +49 - (0)241 - 80 - 24381
 Phone (JSC): +49 - (0)2461 - 61 - 96498
 E-mail: goebbert at jara.rwth-aachen.de
 E-mail: goebbert at vr.rwth-aachen.de
________________________________
From: bjw.ilight at gmail.com [bjw.ilight at gmail.com] on behalf of Brad Whitlock [bjw at ilight.com]
Sent: Monday, May 18, 2015 6:29 PM
To: VisIt Developers; Göbbert, Jens Henrik
Subject: Re: [visit-developers] solved with BG_SYSIODPOSIXMODE=1: VisIt on BlueGene/Q - communication problems

It sounds like you came up with a better solution than I did for socket communication. I had observed that the messages from client to server only completed successfully when the messages were a fixed size. I added a fixed size communication mode enabled by the -fixed-buffer-sockets command line argument in my host profiles to work around the communication issue.

Brad


On Mon, May 18, 2015 at 4:54 AM, Göbbert, Jens Henrik <goebbert at vr.rwth-aachen.de<mailto:goebbert at vr.rwth-aachen.de>> wrote:
Hi Bluegene Developer,

it took some time, but we could track the problem down to the I/O node operation system.
To make a long story short:

VisIt only works for Bluegene/Q if you set the enviroment variable BG_SYSIODPOSIXMODE to 1:
# BG_SYSIODPOSIXMODE
#    Run I/O operations with POSIX rules:
#        0 == I/O operation that is initiated from a compute node can cause multiple I/O operations on the I/O node.
#        1 == Each I/O operation that is initiated from a compute node completes atomically.
(check http://www.redbooks.ibm.com/redbooks/pdfs/sg247948.pdf Appendix D p.145)

It might be a good idea if VisIt tests this variable and gives some warning if it is not set to 1.

Best
Jens Henrik

 -----------------------------------------------
 Dipl.-Ing. Jens Henrik Göbbert
 Cross-Sectional Group „Immersive Visualization“ (CSG ImmVis)
 Jülich Aachen Research Alliance, High Performance Computing (JARA-HPC)

 IT Center, RWTH Aachen University, Virtual Reality Group (VRG)
 Kopernikusstraße 6, 52074 Aachen, Germany

 Institute for Advanced Simulation, Jülich Supercomputing Centre (JSC)
 Wilhelm-Johnen-Straße, 52425 Jülich, Germany

 Phone (VRG): +49 - (0)241 - 80 - 24381<tel:%2B49%20-%20%280%29241%20-%2080%20-%2024381>
 Phone (JSC): +49 - (0)2461 - 61 - 96498<tel:%2B49%20-%20%280%292461%20-%2061%20-%2096498>
 E-mail: goebbert at jara.rwth-aachen.de<mailto:goebbert at jara.rwth-aachen.de>
 E-mail: goebbert at vr.rwth-aachen.de<mailto:goebbert at vr.rwth-aachen.de>
________________________________
From: Göbbert, Jens Henrik [goebbert at vr.rwth-aachen.de<mailto:goebbert at vr.rwth-aachen.de>]
Sent: Monday, May 11, 2015 7:27 PM
To: visit-developers at elist.ornl.gov<mailto:visit-developers at elist.ornl.gov>
Subject: [visit-developers] VisIt on BlueGene/Q - communication problems

Hi BlueGene Developer,

linking VisIt with a static libsimV2 on bluegene/q works fine,
but the communication between the gui and the simulation seems to fail partly:

looking at the visit trace file (debug 5) I can see that the gui can connect to libsimV2,
but does not finish the connection procedure completely:
VisItAttemptToCompleteConnection
AcceptConnection
    Calling accept()
AcceptConnection desc=6
VerifySecurityKeys desc=6
    ReceiveSingleLineFromSocket maxlen=1024, desc=6

It stops at this point and the client 'Window 1' does not response any more.

I have setup a tunnel through the juqueen frondend node for port 5609 (with putty)
and use Windows7 on the client side for the gui.
This connection seems to work (without the tunnel VisIt gui gives errors).

Do I have to tunnel any other port than the one in the sim2-file?

best,
Jens Henrik

-----------------------------------------------
 Dipl.-Ing. Jens Henrik Göbbert
 Cross-Sectional Group „Immersive Visualization“ (CSG ImmVis)
 Jülich Aachen Research Alliance, High Performance Computing (JARA-HPC)

 IT Center, RWTH Aachen University, Virtual Reality Group (VRG)
 Kopernikusstraße 6, 52074 Aachen, Germany

 Institute for Advanced Simulation, Jülich Supercomputing Centre (JSC)
 Wilhelm-Johnen-Straße, 52425 Jülich, Germany

 Phone (VRG): +49 - (0)241 - 80 - 24381<tel:%2B49%20-%20%280%29241%20-%2080%20-%2024381>
 Phone (JSC): +49 - (0)2461 - 61 - 96498<tel:%2B49%20-%20%280%292461%20-%2061%20-%2096498>
 E-mail: goebbert at jara.rwth-aachen.de<mailto:goebbert at jara.rwth-aachen.de>
 E-mail: goebbert at vr.rwth-aachen.de<mailto:goebbert at vr.rwth-aachen.de>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://elist.ornl.gov/pipermail/visit-developers/attachments/20150605/a1beb94e/attachment.html>


More information about the visit-developers mailing list