[visit-developers] Need another set of eyes

Hank Childs hankchilds at yahoo.com
Thu May 28 10:38:54 EDT 2009


Hi Tom,

I don't think it is uninitialized memory, because I can reproduce it on a different machine.  Also, I could only reproduce it after I did an svn update.

See the attached images. before_OpaqueComposite_00[01]_001.png show what each processor has before the composite.  Note that they are not evenly dividing screen space.  That may be a new thing or it may be that's how it has always worked.  (I believe the transparency actor duplicates data along boundaries.)

If you look at after_AllComposites_003.png, you'll see that those lines are actually where there is data on both processors.  And that the lines are where the perspective transform makes data appear for the front face, but not the back face (that's a poor phrasing, so I hope you know what I mean).

If I had to guess, I would say that the composite phase use to respect pre-made boundaries set up by the transparency actor, but that it is no longer doing it.  Jeremy can probably give a little more info, as I think he is more familiar with the implementation.

Best,
Hank

--- On Wed, 5/27/09, tom fogal <tfogal at alumni.unh.edu> wrote:

From: tom fogal <tfogal at alumni.unh.edu>
Subject: [visit-developers] Need another set of eyes
To: visit-developers at ornl.gov
Date: Wednesday, May 27, 2009, 4:12 PM

I've been staring at a compositing issue for too long; I must be
missing something obvious.  Could someone else take a look?

r7402 revealed or created an interesting bug.  I've
attached the 9th images (baseline, my machine, diff) from
tests/rendering/transparency.py, which demonstrates the problem.
Basically I'm getting two lines which appear to average the background
color and the actual image.

Valgrind reports we're doing Bad Things there.  I've also attached
a valgrind log; there's more than one issue, but the one that I'm
referring to is the last entry.  Essentially we new[] in that method
(well, it's a vector now, but the trace is essentially equivalent), and
use that new'd buffer in reduces.  Somehow 7 bytes of that data are
uninitialized.

I've been staring at that code, throwing in little hacks.  I wrote
it out to a std::ofstream in hopes of triggering the valgrind issue
earlier (it didn't).  I deleted the entire method and rewrote it with
nothing in front of me -- and saw the same bug.

Any ideas are more than welcome.

-tom

==24351== Memcheck, a memory error detector.
==24351== Copyright (C) 2002-2008, and GNU GPL'd, by Julian Seward et al.
==24351== Using LibVEX rev 1884, a library for dynamic binary translation.
==24351== Copyright (C) 2004-2008, and GNU GPL'd, by OpenWorks LLP.
==24351== Using valgrind-3.4.1, a dynamic binary instrumentation framework.
==24351== Copyright (C) 2000-2008, and GNU GPL'd, by Julian Seward et al.
==24351== For more details, rerun with: -v
==24351== 
==24351== My PID = 24351, parent PID = 24333.  Prog and args are:
==24351==    /home/tfogal/dev/visit/src/exe/engine_par
==24351==    -host
==24351==    shigeru.sci.utah.edu
==24351==    -dv
==24351==    -timeout
==24351==    480
==24351==    -debug_engine
==24351==    5
==24351==    -norun
==24351==    engine_par
==24351==    -norun
==24351==    engine_par
==24351==    -norun
==24351==    engine_par
==24351==    -valgrind
==24351==    --tool=memcheck
==24351==    --smc-check=all
==24351==    --log-file=e-vg.%p
==24351==    --error-limit=no
==24351==    --track-origins=yes
==24351==    --leak-check=no
==24351==    --gen-suppressions=all
==24351==    --suppressions=/home/tfogal/dev/vg-suppressions/nvidia.vgsupp
==24351==    --suppressions=/home/tfogal/dev/vg-suppressions/ld.vgsupp
==24351==    engine_par
==24351==    -timeout
==24351==    600
==24351==    -notrackmem
==24351==    -pixdiff=5
==24351==    -avgdiff=5
==24351==    -numdiff=0
==24351==    -timing
==24351==    -port
==24351==    5600
==24351==    -noloopback
==24351==    -plugindir
==24351==    :/home/tfogal/.visit/linux-x86_64/plugins:/home/tfogal/dev/visit/src/plugins
==24351==    -key
==24351==    1822c4c31a74c300a456
==24351== 
==24351== Warning: client syscall munmap tried to modify addresses 0xffffffffffffffff-0xffe
==24351== Syscall param writev(vector[...]) points to uninitialised byte(s)
==24351==    at 0x123F0733: writev (in /lib/libc-2.7.so)
==24351==    by 0x15BDCAB9: mca_oob_tcp_msg_send_handler (oob_tcp_msg.c:265)
==24351==    by 0x15BDE1A0: mca_oob_tcp_peer_send (oob_tcp_peer.c:197)
==24351==    by 0x15BE2F2A: mca_oob_tcp_send_nb (oob_tcp_send.c:167)
==24351==    by 0x159CF7EC: orte_rml_oob_send (rml_oob_send.c:137)
==24351==    by 0x159D019A: orte_rml_oob_send_buffer (rml_oob_send.c:269)
==24351==    by 0x15FF23DF: allgather (grpcomm_bad_module.c:369)
==24351==    by 0x15FF2D9E: modex (grpcomm_bad_module.c:497)
==24351==    by 0x10D66CAF: ompi_mpi_init (ompi_mpi_init.c:626)
==24351==    by 0x10DA3C88: PMPI_Init (pinit.c:80)
==24351==    by 0xABEC01F: PAR_Init(int&, char**&) (avtParallel.C:153)
==24351==    by 0x4ED7722: Engine::Initialize(int*, char***, bool) (Engine.C:491)
==24351==  Address 0x1979d96b is 107 bytes inside a block of size 256 alloc'd
==24351==    at 0x4C22A51: realloc (vg_replace_malloc.c:429)
==24351==    by 0xFDC80D6: (within /usr/lib/libGL.so.180.44)
==24351==    by 0x7FEFFBA7F: ???
==24351==  Uninitialised value was created by a stack allocation
==24351==    at 0x112F8A60: opal_ifinit (if.c:147)
{
   <insert a suppression name here>
   Memcheck:Param
   writev(vector[...])
   fun:writev
   fun:mca_oob_tcp_msg_send_handler
   fun:mca_oob_tcp_peer_send
   fun:mca_oob_tcp_send_nb
   fun:orte_rml_oob_send
   fun:orte_rml_oob_send_buffer
   fun:allgather
   fun:modex
   fun:ompi_mpi_init
   fun:PMPI_Init
   fun:_Z8PAR_InitRiRPPc
   fun:_ZN6Engine10InitializeEPiPPPcb
}
==24351== 
==24351== Uninitialised byte(s) found during client check request
==24351==    at 0x11308144: valgrind_module_isdefined (memchecker_valgrind_module.c:112)
==24351==    by 0x11307B87: opal_memchecker_base_isdefined (memchecker_base_wrappers.c:34)
==24351==    by 0x10D9A627: memchecker_call (memchecker.h:96)
==24351==    by 0x10D99F80: PMPI_Gather (pgather.c:45)
==24351==    by 0x4ECEE9A: Engine::GetProcessAttributes() (Engine.C:3123)
==24351==    by 0x4ED9F6A: RPCExecutor<ProcInfoRPC>::Execute(ProcInfoRPC*) (Executors.h:1100)
==24351==    by 0x4EF0573: RPCExecutor<ProcInfoRPC>::Update(Subject*) (RPCExecutor.h:67)
==24351==    by 0x6CE28A9: Subject::Notify() (Subject.C:188)
==24351==    by 0x6B921D0: AttributeSubject::Notify() (AttributeSubject.C:99)
==24351==    by 0xAEF5659: MPIXfer::Process() (MPIXfer.C:320)
==24351==    by 0x4ED041E: Engine::ProcessInput() (Engine.C:1508)
==24351==    by 0x4ED3634: Engine::EventLoop() (Engine.C:1457)
==24351==  Address 0x7feffb9b8 is on thread 1's stack
==24351==  Uninitialised value was created by a stack allocation
==24351==    at 0x4ECEC41: Engine::GetProcessAttributes() (Engine.C:3081)
{
   <insert a suppression name here>
   Memcheck:User
   fun:valgrind_module_isdefined
   fun:opal_memchecker_base_isdefined
   fun:memchecker_call
   fun:PMPI_Gather
   fun:_ZN6Engine20GetProcessAttributesEv
   fun:_ZN11RPCExecutorI11ProcInfoRPCE7ExecuteEPS0_
   fun:_ZN11RPCExecutorI11ProcInfoRPCE6UpdateEP7Subject
   fun:_ZN7Subject6NotifyEv
   fun:_ZN16AttributeSubject6NotifyEv
   fun:_ZN7MPIXfer7ProcessEv
   fun:_ZN6Engine12ProcessInputEv
   fun:_ZN6Engine9EventLoopEv
}
==24351== 
==24351== Syscall param writev(vector[...]) points to uninitialised byte(s)
==24351==    at 0x123F0733: writev (in /lib/libc-2.7.so)
==24351==    by 0x182952D0: mca_btl_tcp_frag_send (btl_tcp_frag.c:108)
==24351==    by 0x18292E26: mca_btl_tcp_endpoint_send (btl_tcp_endpoint.c:257)
==24351==    by 0x1828F183: mca_btl_tcp_send (btl_tcp.c:391)
==24351==    by 0x17A6D9C9: mca_bml_base_send (bml.h:272)
==24351==    by 0x17A6E4EF: mca_pml_ob1_send_request_start_rdma (pml_ob1_sendreq.c:793)
==24351==    by 0x17A618EB: mca_pml_ob1_send_request_start_btl (pml_ob1_sendreq.h:383)
==24351==    by 0x17A615FC: mca_pml_ob1_send_request_start (pml_ob1_sendreq.h:429)
==24351==    by 0x17A62105: mca_pml_ob1_send (pml_ob1_isend.c:119)
==24351==    by 0x191534A0: ompi_coll_tuned_bcast_intra_split_bintree (coll_tuned_bcast.c:590)
==24351==    by 0x19144C3C: ompi_coll_tuned_bcast_intra_dec_fixed (coll_tuned_decision_fixed.c:263)
==24351==    by 0x10D8E700: PMPI_Bcast (pbcast.c:107)
==24351==  Address 0x150b00ee is 461,918 bytes inside a block of size 527,576 alloc'd
==24351==    at 0x4C2291E: malloc (vg_replace_malloc.c:207)
==24351==    by 0xFDC8196: (within /usr/lib/libGL.so.180.44)
==24351==    by 0x700000000: ???
==24351==    by 0x40FC0F: (within /home/tfogal/dev/visit/src/exe/engine_par)
==24351==    by 0x7FEFFB98F: ???
==24351==    by 0x404FB0C: (within /usr/lib/tls/libnvidia-tls.so.180.44)
==24351==    by 0x10D3B331: ompi_free_list_grow (ompi_free_list.c:188)
==24351==    by 0x110D2B51F: ???
==24351==    by 0x1015F: ???
==24351==    by 0x7: ???
==24351==    by 0x1849BCFF: (within /home/tfogal/sw/lib/openmpi/mca_btl_tcp.so)
==24351==  Uninitialised value was created by a heap allocation
==24351==    at 0x4C2291E: malloc (vg_replace_malloc.c:207)
==24351==    by 0xFDC8196: (within /usr/lib/libGL.so.180.44)
==24351==    by 0x700000000: ???
==24351==    by 0x40FC0F: (within /home/tfogal/dev/visit/src/exe/engine_par)
==24351==    by 0x7FEFFB98F: ???
==24351==    by 0x404FB0C: (within /usr/lib/tls/libnvidia-tls.so.180.44)
==24351==    by 0x10D3B331: ompi_free_list_grow (ompi_free_list.c:188)
==24351==    by 0x110D2B51F: ???
==24351==    by 0x1015F: ???
==24351==    by 0x7: ???
==24351==    by 0x1849BCFF: (within /home/tfogal/sw/lib/openmpi/mca_btl_tcp.so)
{
   <insert a suppression name here>
   Memcheck:Param
   writev(vector[...])
   fun:writev
   fun:mca_btl_tcp_frag_send
   fun:mca_btl_tcp_endpoint_send
   fun:mca_btl_tcp_send
   fun:mca_bml_base_send
   fun:mca_pml_ob1_send_request_start_rdma
   fun:mca_pml_ob1_send_request_start_btl
   fun:mca_pml_ob1_send_request_start
   fun:mca_pml_ob1_send
   fun:ompi_coll_tuned_bcast_intra_split_bintree
   fun:ompi_coll_tuned_bcast_intra_dec_fixed
   fun:PMPI_Bcast
}
==24351== 
==24351== Syscall param writev(vector[...]) points to uninitialised byte(s)
==24351==    at 0x123F0733: writev (in /lib/libc-2.7.so)
==24351==    by 0x182952D0: mca_btl_tcp_frag_send (btl_tcp_frag.c:108)
==24351==    by 0x18294F06: mca_btl_tcp_endpoint_send_handler (btl_tcp_endpoint.c:767)
==24351==    by 0x112E3615: event_process_active (event.c:647)
==24351==    by 0x112E39E8: opal_event_base_loop (event.c:819)
==24351==    by 0x112E37BE: opal_event_loop (event.c:726)
==24351==    by 0x112D3972: opal_progress (opal_progress.c:189)
==24351==    by 0x17A60E84: opal_condition_wait (condition.h:99)
==24351==    by 0x17A62502: ompi_request_wait_completion (request.h:395)
==24351==    by 0x17A62456: mca_pml_ob1_send (pml_ob1_isend.c:125)
==24351==    by 0x191534A0: ompi_coll_tuned_bcast_intra_split_bintree (coll_tuned_bcast.c:590)
==24351==    by 0x19144C3C: ompi_coll_tuned_bcast_intra_dec_fixed (coll_tuned_decision_fixed.c:263)
==24351==  Address 0x1979fac4 is 3,044 bytes inside a block of size 3,800 alloc'd
==24351==    at 0x4C2291E: malloc (vg_replace_malloc.c:207)
==24351==    by 0xFDC8196: (within /usr/lib/libGL.so.180.44)
==24351==  Uninitialised value was created by a heap allocation
==24351==    at 0x4C2291E: malloc (vg_replace_malloc.c:207)
==24351==    by 0xFDC8196: (within /usr/lib/libGL.so.180.44)
{
   <insert a suppression name here>
   Memcheck:Param
   writev(vector[...])
   fun:writev
   fun:mca_btl_tcp_frag_send
   fun:mca_btl_tcp_endpoint_send_handler
   fun:event_process_active
   fun:opal_event_base_loop
   fun:opal_event_loop
   fun:opal_progress
   fun:opal_condition_wait
   fun:ompi_request_wait_completion
   fun:mca_pml_ob1_send
   fun:ompi_coll_tuned_bcast_intra_split_bintree
   fun:ompi_coll_tuned_bcast_intra_dec_fixed
}
==24351== 
==24351== Uninitialised byte(s) found during client check request
==24351==    at 0x11308144: valgrind_module_isdefined (memchecker_valgrind_module.c:112)
==24351==    by 0x11307B87: opal_memchecker_base_isdefined (memchecker_base_wrappers.c:34)
==24351==    by 0x10DAC088: memchecker_call (memchecker.h:96)
==24351==    by 0x10DAB5C6: PMPI_Reduce (preduce.c:43)
==24351==    by 0xA76DC51: avtWholeImageCompositerWithZ::MergeBuffers(int, bool, float const*, unsigned char const*, float*, unsigned char*) (avtWholeImageCompositerWithZ.C:492)
==24351==    by 0xA76EC16: avtWholeImageCompositerWithZ::Execute() (avtWholeImageCompositerWithZ.C:346)
==24351==    by 0x4F20AA2: NetworkManager::Render(bool, std::__debug::vector<int, std::allocator<int> >, bool, int, int, bool) (NetworkManager.C:2302)
==24351==    by 0x4EDA3CC: RPCExecutor<RenderRPC>::Execute(RenderRPC*) (Executors.h:1426)
==24351==    by 0x4EF05EB: RPCExecutor<RenderRPC>::Update(Subject*) (RPCExecutor.h:67)
==24351==    by 0x6CE28A9: Subject::Notify() (Subject.C:188)
==24351==    by 0x6B921D0: AttributeSubject::Notify() (AttributeSubject.C:99)
==24351==    by 0xAEF5659: MPIXfer::Process() (MPIXfer.C:320)
==24351==  Address 0x1c0458af is 7 bytes inside a block of size 720,000 alloc'd
==24351==    at 0x4C222DC: operator new(unsigned long) (vg_replace_malloc.c:230)
==24351==    by 0xA772A0C: __gnu_cxx::new_allocator<zfpixel>::allocate(unsigned long, void const*) (new_allocator.h:92)
==24351==    by 0xA772A3B: std::__norm::_Vector_base<zfpixel, std::allocator<zfpixel> >::_M_allocate(unsigned long) (stl_vector.h:144)
==24351==    by 0xA772A80: std::__norm::_Vector_base<zfpixel, std::allocator<zfpixel> >::_Vector_base(unsigned long, std::allocator<zfpixel> const&) (stl_vector.h:117)
==24351==    by 0xA772B0A: std::__norm::vector<zfpixel, std::allocator<zfpixel> >::vector(unsigned long, zfpixel const&, std::allocator<zfpixel> const&) (stl_vector.h:235)
==24351==    by 0xA772B4A: std::__debug::vector<zfpixel, std::allocator<zfpixel> >::vector(unsigned long, zfpixel const&, std::allocator<zfpixel> const&) (vector:84)
==24351==    by 0xA76D7E5: avtWholeImageCompositerWithZ::MergeBuffers(int, bool, float const*, unsigned char const*, float*, unsigned char*) (avtWholeImageCompositerWithZ.C:439)
==24351==    by 0xA76EC16: avtWholeImageCompositerWithZ::Execute() (avtWholeImageCompositerWithZ.C:346)
==24351==    by 0x4F20AA2: NetworkManager::Render(bool, std::__debug::vector<int, std::allocator<int> >, bool, int, int, bool) (NetworkManager.C:2302)
==24351==    by 0x4EDA3CC: RPCExecutor<RenderRPC>::Execute(RenderRPC*) (Executors.h:1426)
==24351==    by 0x4EF05EB: RPCExecutor<RenderRPC>::Update(Subject*) (RPCExecutor.h:67)
==24351==    by 0x6CE28A9: Subject::Notify() (Subject.C:188)
==24351==  Uninitialised value was created by a stack allocation
==24351==    at 0xA76D748: avtWholeImageCompositerWithZ::MergeBuffers(int, bool, float const*, unsigned char const*, float*, unsigned char*) (avtWholeImageCompositerWithZ.C:435)
{
   <insert a suppression name here>
   Memcheck:User
   fun:valgrind_module_isdefined
   fun:opal_memchecker_base_isdefined
   fun:memchecker_call
   fun:PMPI_Reduce
   fun:_ZN28avtWholeImageCompositerWithZ12MergeBuffersEibPKfPKhPfPh
   fun:_ZN28avtWholeImageCompositerWithZ7ExecuteEv
   fun:_ZN14NetworkManager6RenderEbNSt7__debug6vectorIiSaIiEEEbiib
   fun:_ZN11RPCExecutorI9RenderRPCE7ExecuteEPS0_
   fun:_ZN11RPCExecutorI9RenderRPCE6UpdateEP7Subject
   fun:_ZN7Subject6NotifyEv
   fun:_ZN16AttributeSubject6NotifyEv
   fun:_ZN7MPIXfer7ProcessEv
}
==24351== 
==24351== ERROR SUMMARY: 34 errors from 5 contexts (suppressed: 3006 from 5)
==24351== malloc/free: in use at exit: 11,672,989 bytes in 29,542 blocks.
==24351== malloc/free: 405,295 allocs, 375,753 frees, 155,628,184 bytes allocated.
==24351== For a detailed leak analysis,  rerun with: --leak-check=yes
==24351== For counts of detected errors, rerun with: -v



      
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://email.ornl.gov/pipermail/visit-developers/attachments/20090528/7477f527/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: after_AllComposites_003.png
Type: image/png
Size: 95356 bytes
Desc: not available
URL: <https://email.ornl.gov/pipermail/visit-developers/attachments/20090528/7477f527/after_AllComposites_003-0001.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: before_OpaqueComposite_000_001.png
Type: image/png
Size: 46772 bytes
Desc: not available
URL: <https://email.ornl.gov/pipermail/visit-developers/attachments/20090528/7477f527/before_OpaqueComposite_000_001-0001.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: before_OpaqueComposite_001_001.png
Type: image/png
Size: 36770 bytes
Desc: not available
URL: <https://email.ornl.gov/pipermail/visit-developers/attachments/20090528/7477f527/before_OpaqueComposite_001_001-0001.png>


More information about the visit-developers mailing list