OrangeFS over InfiniBand

Lindolfo Meira meira at cesup.ufrgs.br
Fri Dec 7 08:57:40 EST 2018


Hi guys.

Following on my issue, if I could just get some additional pointers from 
you, it would be great.

I've been able to saturate my 50 Gb/s link when performing I/O from a 
single OrangeFS client, and that's great. But when I scale to 6 clients or 
more I can't get close to the 300 Gb/s I was expecting my 6 servers would 
provide.

I managed to get 154 Gb/s twice, using the following set of parameters 
with MPIIO in IOR: 4 tasks/client, 12 clients, 4MB transfer size and 1GB 
block size. If I increase the number of tasks per client, or the number of 
clients, the performance gets worse.

As I said before, I don't think the problem is with the network, but just 
so you know, my switch is a Mellanox MSX6710. And all my 6 OrangeFS 
servers operate with stable 50 Gb/s links (this is measured performance, 
their adapters are 56 Gb/s).

Any input is appreciated.


Lindolfo Meira, MSc
Diretor Geral, Centro Nacional de Supercomputação
Universidade Federal do Rio Grande do Sul
+55 (51) 3308-3139

On Thu, 29 Nov 2018, Lindolfo Meira wrote:

> Hi Becky.
> 
> I thought I'd tested for the optimum number of tasks and found 4 to be the 
> sweet spot. But apparently I was wrong. Following your advice I reran the 
> tests and now found 8 tasks to work best. Now I'm getting around 46 Gb/s. 
> Awesome :D
> 
> Thanks Becky. Thanks you all guys.
> 
> 
> Lindolfo Meira, MSc
> Diretor Geral, Centro Nacional de Supercomputação
> Universidade Federal do Rio Grande do Sul
> +55 (51) 3308-3139
> 
> On Thu, 29 Nov 2018, Becky Ligon wrote:
> 
> > Try using more tasks.  Most likely, you are not saturating the link.
> > 
> > Becky Ligon
> > 
> > On Thu, Nov 29, 2018 at 2:42 PM Lindolfo Meira <meira at cesup.ufrgs.br> wrote:
> > 
> > > Hello Mike.
> > >
> > > I get about 9 Gb/s.
> > >
> > > I've been testing single client with pvfs2-cp 'cause I assume it uses
> > > OrangeFS's direct interface and therefore would achieve better results.
> > > But using the MPIIO interface with IOR, using 4 tasks, I managed to
> > > achieve 33.6 Gb/s. It's better than the 25 Gb/s I get with pvfs2-cp, but
> > > still too far from the 50 Gb/s the network allows.
> > >
> > >
> > > Lindolfo Meira, MSc
> > > Diretor Geral, Centro Nacional de Supercomputação
> > > Universidade Federal do Rio Grande do Sul
> > > +55 (51) 3308-3139
> > >
> > > On Thu, 29 Nov 2018, Mike Marshall wrote:
> > >
> > > > Hi Lindolfo
> > > >
> > > > I wonder if you'd see different results using
> > > > "dd if=/dev/zero of=/pvfsmnt/foo count=128 bs=4194304"
> > > > intead of pvfs2-cp? That blocksize is the same size as the
> > > > buffer shared by the kernel module and the client...
> > > >
> > > > -Mike
> > > >
> > > >
> > > > On Thu, Nov 29, 2018 at 9:19 AM Lindolfo Meira <meira at cesup.ufrgs.br>
> > > wrote:
> > > > >
> > > > > Hi David.
> > > > >
> > > > > Yes, I updated the drivers to version 4.4-2 (the latest available when
> > > I
> > > > > did it) but the error didn't go away. And the adapters also have the
> > > > > latest firmware available, as well as the switch.
> > > > >
> > > > >
> > > > > Lindolfo Meira, MSc
> > > > > Diretor Geral, Centro Nacional de Supercomputação
> > > > > Universidade Federal do Rio Grande do Sul
> > > > > +55 (51) 3308-3139
> > > > >
> > > > > On Thu, 29 Nov 2018, David Reynolds wrote:
> > > > >
> > > > > > Hi Lindolfo,
> > > > > >
> > > > > > In regards to the mlx4_core error, have you tried updating the
> > > firmware
> > > > > > on the IB cards? I don’t remember seeing this error myself, but I
> > > found
> > > > > > some posts on the Mellanox forums that say updating the firmware
> > > fixed
> > > > > > similar issues.
> > > > > >
> > > > > > David Reynolds
> > > > > > Software Engineer
> > > > > > Omnibond Systems, LLC.
> > > > > >
> > > > > > > On Nov 23, 2018, at 2:25 PM, Lindolfo Meira <meira at cesup.ufrgs.br>
> > > wrote:
> > > > > > >
> > > > > > > Hi Boyd.
> > > > > > >
> > > > > > > Yes, that's correct. All 6 servers are exclusivelly serving. The
> > > one
> > > > > > > client is exclusivelly "clienting".
> > > > > > >
> > > > > > > Kernel version is 4.12.14 (the latest available with OpenSUSE Leap
> > > 15) and
> > > > > > > yes, I'm using the upstream orangefs module.
> > > > > > >
> > > > > > >
> > > > > > > Lindolfo Meira, MSc
> > > > > > > Diretor Geral, Centro Nacional de Supercomputação
> > > > > > > Universidade Federal do Rio Grande do Sul
> > > > > > > +55 (51) 3308-3139
> > > > > > >
> > > > > > >> On Fri, 23 Nov 2018, Boyd Wilson wrote:
> > > > > > >>
> > > > > > >> Lindolfo,
> > > > > > >> I am copying the users at lists.orangefs.org list, the older one
> > > was having
> > > > > > >> dns migration issues.     I would have recommended the latest
> > > OFED stack,
> > > > > > >> but I have not seen the issue you describe.   Also what is the
> > > linux kernel
> > > > > > >> version and which orangefs kmod are you using, the upstream one?
> > >   Also
> > > > > > >> with the single client, it is not also a orangefs server, correct?
> > > > > > >>
> > > > > > >> -b
> > > > > > >>
> > > > > > >> On Fri, Nov 23, 2018 at 12:31 PM Lindolfo Meira <
> > > meira at cesup.ufrgs.br>
> > > > > > >> wrote:
> > > > > > >>
> > > > > > >>> Hi.
> > > > > > >>>
> > > > > > >>> I've just finished implementing an OrangeFS and was wondering if
> > > I could
> > > > > > >>> get some pointers from you, since I don't think my system is
> > > performing at
> > > > > > >>> the level it should. I've already examined all OrangeFS
> > > documentation and
> > > > > > >>> FAQ, but nothing helped. The documentation fails to mention any
> > > actual
> > > > > > >>> community mailing lists, so I tried to subscribe to a
> > > pfvs2-users list I
> > > > > > >>> found on the Internet but it seems to be dead.
> > > > > > >>>
> > > > > > >>> My OrangeFS implementation has 1 client and 6 exclusive servers
> > > (serving
> > > > > > >>> meta and data, as the attached config file shows). Each one of
> > > the servers
> > > > > > >>> have an XFS formatted RAID6 volume dedicated to OrangeFS. All of
> > > the nodes
> > > > > > >>> connected by 56 Gb/s InfiniBand adapters.
> > > > > > >>>
> > > > > > >>> I've performed a benchmark on the network and achieved a very
> > > stable mark
> > > > > > >>> of 50 Gb/s per link.
> > > > > > >>>
> > > > > > >>> When I benchmark the XFS formatted RAID6 volumes individually
> > > (logging
> > > > > > >>> into the servers and writing directly to the partition --
> > > sequential
> > > > > > >>> write, using DD), I get a write performance of about 11 Gb/s on
> > > average.
> > > > > > >>> Since there are 6 servers, I'm expecting at the very least,
> > > something
> > > > > > >>> somewhat close to 66 Gb/s of write performance for my parallel
> > > file
> > > > > > >>> system. Right? 6 times 11 Gb/s. But for now my system has a
> > > single client,
> > > > > > >>> and its network adapter reaches at most 50 Gb/s, so that's my
> > > real
> > > > > > >>> constraint.
> > > > > > >>>
> > > > > > >>> However, I've been unable to achieve more than 25 Gb/s when
> > > writing to
> > > > > > >>> such system using pvfs2-cp. And I can't understand why. I'm
> > > using a buffer
> > > > > > >>> size of 1GiB and a stripe size of 2MiB (which seem to be the
> > > optimum
> > > > > > >>> values for me -- anything different than these hurts the
> > > performance).
> > > > > > >>>
> > > > > > >>> Another interesting thing to notice is the fact that when I try
> > > a single
> > > > > > >>> data file (thus a single server) I get a write performance of
> > > about 9
> > > > > > >>> Gb/s. I'm taking that as pretty good, since writing directly to
> > > the
> > > > > > >>> partition gave me 11 Gb/s. But then, as I increase the number of
> > > data
> > > > > > >>> files, the system won't scale as it should.
> > > > > > >>>
> > > > > > >>> I'm using OpenSUSE Leap 15.0 with inbox IB drivers. The network
> > > doesn't
> > > > > > >>> seem to be a problem, but I thought maybe OrangeFS would do
> > > better with
> > > > > > >>> the latest Mellanox OFED drivers. So I installed them and
> > > recompiled the
> > > > > > >>> code. Still nothing. I didn't tweak the parameters of the IB
> > > drivers
> > > > > > >>> though.
> > > > > > >>>
> > > > > > >>> At last, regardless of the driver (SUSE inbox or latest Mellanox
> > > OFED) I
> > > > > > >>> noticed mlx4_core logs this strange message on the client every
> > > time I use
> > > > > > >>> pvfs2-cp: "mlx4_core command 0x1f failed: fw status = 0x2".
> > > > > > >>>
> > > > > > >>> What am I missing here? Any help is appreciated.
> > > > > > >>>
> > > > > > >>>
> > > > > > >>> Regards,
> > > > > > >>>
> > > > > > >>> Lindolfo Meira, MSc
> > > > > > >>> Diretor Geral, Centro Nacional de Supercomputação
> > > > > > >>> Universidade Federal do Rio Grande do Sul
> > > > > > >>> +55 (51) 3308-3139
> > > > > >
> > > >
> > 


More information about the Users mailing list