the Rainbow Networks
+++eXy FFA
Map: q3dm12
Players: 2 / 18
The golden pot › RAINBOW NETWORKS › Tech & Support › build optimizations with ioquake3
build optimizations with ioquake3
Something wrong with our servers or your system?
Go to page 1, 2  Next
Post new topic   Reply to topic   Printer Friendly Page     Forum IndexTech & Support
View previous topic :: View next topic  
Author Message
hyp3rfocus
Forum Addict
Forum Addict


Joined: Aug 25, 2007
Posts: 466
Location: england

PostPosted: Sun May 24, 2009 4:15 pm    Post subject: build optimizations with ioquake3 Reply with quote

i've got a copy of the svn code for ioquake3 and occasionally i build a fresh copy of the engine. i've been looking at the Makefile and wondering if there are any setting that i can change to make the engine quicker. (linux build).

any suggestions?
Back to top
View user's profile Send e-mail
hyp3rfocus
Forum Addict
Forum Addict


Joined: Aug 25, 2007
Posts: 466
Location: england

PostPosted: Sun May 24, 2009 4:26 pm    Post subject: Re: build optimizations with ioquake3 Reply with quote

this bit of the linux settings looks interesting.

Code::
  ifeq ($(ARCH),i386)
    OPTIMIZE = -O3 -march=i586 -fomit-frame-pointer -ffast-math \
      -funroll-loops -falign-loops=2 -falign-jumps=2 \
      -falign-functions=2 -fstrength-reduce
    HAVE_VM_COMPILED=true
  else

if the architecture is i386 (which mine is) then it uses those settings, so the relevant bit for me to paste into Makefile.local and start tweaking is this...

Code::
    OPTIMIZE = -O3 -march=i586 -fomit-frame-pointer -ffast-math \
      -funroll-loops -falign-loops=2 -falign-jumps=2 \
      -falign-functions=2 -fstrength-reduce

any idea what those settings do?
Back to top
View user's profile Send e-mail
Falkland
Übergod
Übergod


Joined: Aug 01, 2008
Posts: 922
Location: Nowhere

PostPosted: Sun May 24, 2009 6:14 pm    Post subject: Re: build optimizations with ioquake3 Reply with quote

For a complete explanation of the compiler optimization flags look at the GCC documentation :
- Optimize Options
- i386 and x86_64 Options

I use these optimizations :

Code::

    OPTIMIZE = -O3 -march=native -mtune=native -maccumulate-outgoing-args \
                       -mno-push-args -fomit-frame-pointer -ffast-math \
                       -falign-loops=4 -falign-jumps=4 -mfpmath=sse -Wa,-mtune=pentium4 \
                       -Wa,--reduce-memory-overheads -mieee-fp \
                       -fstrength-reduce -falign-functions=4 -fprefetch-loop-arrays \
                       -fstack-protector -fstack-protector-all -D_FORTIFY_SOURCE=2 \
                       -fsched2-use-traces -fivopts

LDFLAGS="-Wl,-O2"


Pay attention to the -mfpmath=sse option : this option forces the use of the SSE floating point unit on the i386 machines ( > Pentium III ) instead of the i387 math coprocessor. If u compile any engine with this flag and u will experiment too much serial crashes , recompile it without this option ( for me it's currently working in OA , but not in ioquake3 - it works perfectly locally , but I can't join any online q3 server - and neither in ioUrbanTerrorClient compiled with the sources downloaded from here ). This is valid for sure for "real" x86 machines. Idk what could happen on oa x86_64 machine running i386 code.

Pay attention as well at the -Wa,-mtune=pentium4 : the -Wa,<optimization> flag passes the <optimization> argument to the GNU as assembler ... the assembler version that runs on my machine accept only these tunes for the CPU :

Code::
// as --help output
.....
  -march=CPU/-mtune=CPU   generate code/optimize for CPU, where CPU is one of:
                           i386, i486, pentium, pentiumpro, pentium4, nocona,
                           core, core2, k6, athlon, k8, generic32, generic64

I pass the "pentium4" argument while compiling for my machine ( Pentium-M ) because it is quite compatible with the very first version of the P4 ... newer GNU as versions may accept more specific tunes.

If u're using the i386 arch but u are on a x86_64 machine, compilation may fail with the -mcpu=native -mtune=native gcc options. Use a CPU compatible with the x86_64 platform ( eg pentium3 , pentium4 ... )

Using GCC-4.4.0 - the Ubuntu 9.10 default compiler - may help to increase the global performances : multimedia.cx/eggs/las...or-awhile/

This is my Makefile.local :

Code::
USE_CURL_DLOPEN=1
USE_OPENAL_DLOPEN=1

USE_MUMBLE=0
USE_VOIP=0
USE_INTERNAL_SPEEX=0

More details , infos and experiments in this post
Back to top
View user's profile
hyp3rfocus
Forum Addict
Forum Addict


Joined: Aug 25, 2007
Posts: 466
Location: england

PostPosted: Sun May 24, 2009 9:11 pm    Post subject: Re: build optimizations with ioquake3 Reply with quote

wow, thanks falkland. that's some really helpful stuff. :grin:
Back to top
View user's profile Send e-mail
kernel_panic
Übergod
Übergod


Joined: Aug 28, 2007
Posts: 751
Location: uk

PostPosted: Mon May 25, 2009 12:24 am    Post subject: Re: build optimizations with ioquake3 Reply with quote

Don't forget to post what the performance gains are.

_________________
"Fuelling off topic babble since day 1."
Back to top
View user's profile
hyp3rfocus
Forum Addict
Forum Addict


Joined: Aug 25, 2007
Posts: 466
Location: england

PostPosted: Thu May 28, 2009 9:26 pm    Post subject: Re: build optimizations with ioquake3 Reply with quote

i've played with this for a bit and to be honest i can't really see any obvious fps increase, but it's definitely a lot less buggy that the iourbanterror engine. i'd definitely recommend doing this.

i might try building it with some of the features like voip. that could be interesting to play with.
Back to top
View user's profile Send e-mail
Falkland
Übergod
Übergod


Joined: Aug 01, 2008
Posts: 922
Location: Nowhere

PostPosted: Thu May 28, 2009 11:45 pm    Post subject: Re: build optimizations with ioquake3 Reply with quote

I don't use VOIP because even it has a very low bandwidth consumption when it's used , it uses precious ( for me ) CPU cycles ... and the engine code is smaller without it ( it compiles and statically links libspeex and mumble ) ... IDK if there's a way to add the DL_OPEN support to local libspeex and mumble ( which come both with every linux distribution ) ... I din't try yet.

I've found that disabling also OpenAl increases fps ( 10-15 or more ) , but at the cost of dealing with SDL sound backend that it has not a good sound spatialization. Of course disabling sounds at all increases fps a little more at an obvious cost.

On the rendering side , as stated here by Xreal project guys , we cannot ever have a significant fps increment because of an ancient software technology.
Back to top
View user's profile
hyp3rfocus
Forum Addict
Forum Addict


Joined: Aug 25, 2007
Posts: 466
Location: england

PostPosted: Fri May 29, 2009 6:28 am    Post subject: Re: build optimizations with ioquake3 Reply with quote

well, the good news is the ioquake3 guys have noticed xreal's vbo stuff and are thinking about attempting to use it in ioquake3. they discuss it in this thread. hopefully it gets implemented and ioquake3 gets a major speed boost. :grin:
Back to top
View user's profile Send e-mail
Falkland
Übergod
Übergod


Joined: Aug 01, 2008
Posts: 922
Location: Nowhere

PostPosted: Fri May 29, 2009 6:50 pm    Post subject: Re: build optimizations with ioquake3 Reply with quote

hyp3rfocus wrote:
well, the good news is the ioquake3 guys have noticed xreal's vbo stuff and are thinking about attempting to use it in ioquake3. they discuss it in this thread. hopefully it gets implemented and ioquake3 gets a major speed boost. :grin:

I've read tons of pages starting by that thread .... I've found 3/4 patches to apply to ioquake3 engine ... let's see if one of them will work at least :grin:
Back to top
View user's profile
Falkland
Übergod
Übergod


Joined: Aug 01, 2008
Posts: 922
Location: Nowhere

PostPosted: Wed Jun 03, 2009 3:06 pm    Post subject: Re: build optimizations with ioquake3 Reply with quote

mmm ... it seems to be a very hard work including VBO in ioquake3 while mantaining Q3A compatibility : icculus.org/pipermail/...02466.html
Back to top
View user's profile
Falkland
Übergod
Übergod


Joined: Aug 01, 2008
Posts: 922
Location: Nowhere

PostPosted: Mon Jun 08, 2009 5:16 pm    Post subject: Re: build optimizations with ioquake3 Reply with quote

For whom is still interested :

Code::
    OPTIMIZE = -O3 -march=pentium-m -mtune=pentium-m -maccumulate-outgoing-args \
                       -mno-push-args -fomit-frame-pointer -ffast-math \
                       -falign-loops=4 -falign-jumps=4 -mfpmath=sse -Wa,-mtune=pentium4 -mieee-fp \
                       -fstrength-reduce -falign-functions=4 -fprefetch-loop-arrays \
                       -fstack-protector -fstack-protector-all -D_FORTIFY_SOURCE=2 \
                       -fsched2-use-traces -fivopts -fsched2-use-superblocks -fsee -ftracer 

LDFLAGS="-Wl,-O2"

I've updated the CFLAGS adding -fsched2-use-superblocks -fsee -ftracer and removing -Wa, --reduce-memory-overheads since it does not perform any optimization to the code. It only reduces memory usage while assembler process is running.

I've also switched -march/-mtune flags from native to the specific CPU because the native directive doesn't recognize (yet) exactly the processor.

I've successfully compiled an oa_ded and an openarena client binary on a debian SID machine ( default compiler gcc-4.3.3 ) , but I've obtained runnable engines _ONLY_ when I've removed the -fstack-protector-all flag ( the binaries compiled with this flag continues to crash ) .... strange and weird bug , maybe related to the problem they have with glibc-2.9 ... EDIT : this bug occurs also in another machine using gcc-4.3 and gcc-4.4 with an older glibc version ... so I guess it's a problem related to gcc >= 4.3

Tested with both gcc-4.3.3 and with gcc-4.4.

Same thing for ioquake3 ( I didn't use the -mfpmath=sse flag as explained in a prevoius reply )
Back to top
View user's profile
jackthompson
Admin
Admin


Joined: Aug 15, 2007
Posts: 1302
Location: Here

PostPosted: Tue Jun 09, 2009 12:05 am    Post subject: Re: build optimizations with ioquake3 Reply with quote

remove/comment the use of textures, shaders and also lights (using ambient instead) when rendering the bsp.. later the polygons can be shaded based on the direction of their normale (flat shading)... just by turning XYZ values into RGB then eventually greyscale and change the colortone of it.. or so..
Back to top
View user's profile
Falkland
Übergod
Übergod


Joined: Aug 01, 2008
Posts: 922
Location: Nowhere

PostPosted: Tue Jun 09, 2009 8:54 pm    Post subject: Re: build optimizations with ioquake3 Reply with quote

jackthompson wrote:
remove/comment the use of textures, shaders and also lights (using ambient instead) when rendering the bsp.. later the polygons can be shaded based on the direction of their normale (flat shading)... just by turning XYZ values into RGB then eventually greyscale and change the colortone of it.. or so..

Eh ??? .... :/

Is this a reply to the previous one ( about the integration of the VBO support in ioq3 ) or is it a way to reply to a statement in klingon language with another statement in romoulan language ?
Back to top
View user's profile
kernel_panic
Übergod
Übergod


Joined: Aug 28, 2007
Posts: 751
Location: uk

PostPosted: Tue Jun 09, 2009 10:41 pm    Post subject: Re: build optimizations with ioquake3 Reply with quote

Quote::
Is this a reply to the previous one ( about the integration of the VBO support in ioq3 ) or is it a way to reply to a statement in klingon language with another statement in romoulan language ?

...said the guy who speaks like this:

Code::
 *======================================================================
 */
#include <stdio.h>
#include <memory.h>
#include <math.h>

#include "globalp.c.h"

#define PANELSIZE 10

#define max(a,b) ((a) > (b) ? (a) : (b))
#define min(a,b) ((a) < (b) ? (a) : (b))

/*
   Internal PeIGS routine

   modified gram-schmidt, 1-D systolic panelized version
   */

void mgs_3( n, colF, mapF, b1, bn, nvecsZ, first, first_buf, iscratch, scratch)
     Integer *n, mapF[], *b1, *bn, *nvecsZ, *first, iscratch[];
     DoublePrecision **colF, first_buf[], scratch[];
{
  /*
   */

  /*
     n = number of vector to be orthogonalized
     b1 = beginning of vector subscript
     bn = end of vector subscript
     (e.g. colF[i][b1:bn] )
     colF = double pointer to the matrix
     mapF = array describing the processors holding columns of F
     first = 1, then mgs only my local block
                otherwise do full mgs.
     iscratch = integer scratch space
     scratch = double precision scratch space
     */

  static Integer IONE = 1, MONE=(DoublePrecision) -1.0e0;
  Integer jndx, vec_len;
  Integer i, k, me, isize, indx, kndx;
  Integer nvecs_in, nvecs, iii;
  Integer j, bb, nproc;
  Integer rsize, miter, mvecs, kk, itype, iter, iremain;

  Integer *mapvecF;
  Integer *iscrat, *proclist, naproc;
  Integer *mapvec_in, me_indx;

  DoublePrecision *buffer, *in_buffer;
  DoublePrecision t, *dptr, *dptr1;

  /*
    blas calls
    */

  extern void dcopy_(), daxpy_(), dscal_();
  extern DoublePrecision ddot_();

e  /*
    mxsubs calls
    */

  extern Integer mxwrit_(),  mxread_();
  extern Integer count_list();
  extern Integer fil_mapvec_();
  extern Integer mxmynd_();
  extern Integer reduce_list4();
  extern Integer indxL();

  /*
    at this point inputs are minimally acceptable

    check to see if mapF are the same set of processors
    */

  me = mxmynd_();
  naproc = mxnprc_();


#ifdef DEBUG1
  fprintf(stderr, " \n" );
  fprintf(stderr, " In mgs1b me = %d \n", me );
  fprintf(stderr, " \n" );
  i = *nvecsZ-1;
  for( iii = 0; iii < *n; iii++)
    if( mapF[iii] == me ) {
      i++;
      for( j = *b1; j <= *bn; j++)
       fprintf(stderr, " mgs1b me = %d vecZ[%d][%d] = %g \n",
                     me, iii, j, colF[i][j]);
    }

  fprintf(stderr, " \n" );
  for( iii = 0; iii < *n; iii++)
  fprintf(stderr, " mgs1b me = %d mapZ[%d] = %d \n", me, iii, mapF[iii]);
  fprintf(stderr, " in mgs1b me = %d \n", me);
#endif

  vec_len = *bn - *b1 + 1;

  iscrat = iscratch;

  mapvecF = iscrat;
  nvecs = fil_mapvec_( &me, n, mapF, mapvecF );

 bb = *b1;

  proclist = iscrat;
  nproc = reduce_list4( *n, mapF, proclist, iscrat + naproc );

#ifdef DEBUG1
  fprintf(stderr, " me = %d mgs1b nprocs = %d n = %d \n", me , nproc, *n);
#endif

  if( *first == 1  || nproc == 1 ) {

    /*
     * mgs local block and return
     */

    k = *nvecsZ;
    for ( jndx = k; jndx < k + nvecs; jndx++ ){
        dptr = &colF[jndx][bb];
        t = dnrm2_( &vec_len, dptr, &IONE );
        t = (DoublePrecision) 1.0e0/t;
        dscal_( &vec_len, &t, dptr, &IONE);
        for ( indx = jndx + 1; indx < k + nvecs; indx++ ){
          dptr1 = &colF[indx][bb];
          t = -ddot_( &vec_len, dptr, &IONE, dptr1, &IONE );
          daxpy_( &vec_len, &t, dptr, &IONE, dptr1, &IONE );
        }
    }
    return;
  }

  iscrat += nproc;
  mapvec_in = iscrat;

  me_indx = indxL( me, nproc, proclist);

  buffer = (DoublePrecision *) scratch;
  in_buffer = buffer;

  k = *nvecsZ;

  /*
   *  MGS my part of cluster againt the first part of the cluster,
   *  which is stored in first_buf.
   */

  nvecs_in = fil_mapvec_( &proclist[0], n, mapF, mapvec_in);

  if( me == proclist[1] ) {
    isize = vec_len * nvecs_in;
    dcopy_( &isize, first_buf, &IONE, in_buffer, &IONE);
  }

  if( nproc > 2 ) {
    rsize = nvecs_in * vec_len * sizeof(DoublePrecision);

}

  dptr = in_buffer;
  for ( iii = k; iii < k + nvecs; iii++ ){
    dptr = &colF[iii][bb];
    dptr1 = in_buffer;
    for ( jndx = 0; jndx < nvecs_in; jndx++ ){
      t = -ddot_( &vec_len, dptr, &IONE, dptr1, &IONE);
      daxpy_( &vec_len, &t, dptr1, &IONE, dptr, &IONE );
      dptr1 += vec_len;
    }
  }

  /*
   *  MGS the rest of the cluster.
   */

  for ( i = 1; i < nproc - 1; i++ ) {

    if ( me_indx < i )
      break;

    kndx = proclist[i];

    nvecs_in = fil_mapvec_( &kndx, n, mapF, mapvec_in);

    miter   = nvecs_in / PANELSIZE;
    iremain = nvecs_in - miter * PANELSIZE;

    if( miter * PANELSIZE != nvecs_in )
       miter++;

    mvecs = PANELSIZE;
    rsize = mvecs * vec_len * sizeof(DoublePrecision);

    kk    = k;
    itype = 11113;
    for ( iter = 1; iter < miter + 1; iter++ ){

      if( iter == miter && iremain > 0 ) {
        mvecs = iremain;
        rsize = mvecs * vec_len * sizeof(DoublePrecision);
      }

      if ( kndx == me ) {

        for ( jndx = kk; jndx < kk + mvecs; jndx++ ){
          dptr = &colF[jndx][bb];
          t = dnrm2_( &vec_len, dptr, &IONE );
          t = 1.0e0/t;
          dscal_( &vec_len, &t, dptr, &IONE);
          for ( indx = jndx + 1; indx < kk + mvecs; indx++ ){
            dptr1 = &colF[indx][bb];
            t = ddot_( &vec_len, dptr, &IONE, dptr1, &IONE );
            t *= MONE;
            daxpy_( &vec_len, &t, dptr, &IONE, dptr1, &IONE );

as previously pointed out.

_________________
"Fuelling off topic babble since day 1."
Back to top
View user's profile
Falkland
Übergod
Übergod


Joined: Aug 01, 2008
Posts: 922
Location: Nowhere

PostPosted: Wed Jun 10, 2009 2:38 pm    Post subject: Re: build optimizations with ioquake3 Reply with quote

Another note on gcc >= 4.3 ...

Since gcc >= 4.3 automagically adds the flag -ftree-vectorize at the optimization level 3 ( -O3 ) , it will produce SSE compliant code if it will detects a SSE capable CPU ( >= Pentium3 ).

And this will happen also with the ioquake3 default OPTIMIZE string :

Code::
    OPTIMIZE = -O3 -march=i586 -fomit-frame-pointer -ffast-math \
      -funroll-loops -falign-loops=2 -falign-jumps=2 \
      -falign-functions=2 -fstrength-reduce

So if the next OA release will be compiled with gcc >= 4.3 on a CPU >= pentium3 , the final binary WILL NOT RUN on Pentium , PentiumPro, Pentium2 , AMD K6 , AMD K7 ( Thunderbird ) ... etc. .... unless it is explicitely added -fno-tree-vectorize to the OPTIMIZE string.

Hopefully none is still using a so obsolete machine.
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic   Printer Friendly Page     Forum Index -> Tech & Support All times are GMT + 1 Hour
Go to page 1, 2  Next
Page 1 of 2


Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Welcome Anonymous


Membership:
Latest: kontol
New Today: 0
New Yesterday: 0
Overall: 355

People Online:
Members: 0
Visitors: 46
Total: 46
Who Is Where:
 Visitors:
01: The golden pot
02: The golden pot
03: Home
04: My Account
05: The golden pot
06: The golden pot
07: The golden pot
08: Home
09: The golden pot
10: The golden pot
11: The golden pot
12: The golden pot
13: The golden pot
14: Home
15: The golden pot
16: Home
17: The golden pot
18: The golden pot
19: The golden pot
20: Maps
21: My Account
22: Home
23: Home
24: Home
25: The golden pot
26: My Account
27: The golden pot
28: Home
29: The golden pot
30: The golden pot
31: The golden pot
32: The golden pot
33: Maps
34: The golden pot
35: The golden pot
36: The golden pot
37: The golden pot
38: The golden pot
39: The golden pot
40: My Account
41: The golden pot
42: The golden pot
43: The golden pot
44: The golden pot
45: Maps
46: Maps

Staff Online:

No staff members are online!

The Rainbow Networks website is hosted by JockeTF and Soder on furver.se.

The Rainbow Networks
Interactive software released under GNU GPL, Code Credits, Privacy Policy
Azul theme and related images designed by Jamin