Il y a un fichier particulier dans l'exécution Cilk vous trouverez peut-être intéressant à savoir CILK-sysdep.h où il contient des correspondances spécifiques du système w.r.t barrières de mémoire. J'extrais une petite section w.r.t ur question sur x86 à savoir i386
file:-- cilk-sysdep.h (the numbers on the LHS are actually line numbers)
252 * We use an xchg instruction to serialize memory accesses, as can
253 * be done according to the Intel Architecture Software Developer's
254 * Manual, Volume 3: System Programming Guide
255 * (http://www.intel.com/design/pro/manuals/243192.htm), page 7-6,
256 * "For the P6 family processors, locked operations serialize all
257 * outstanding load and store operations (that is, wait for them to
258 * complete)." The xchg instruction is a locked operation by
259 * default. Note that the recommended memory barrier is the cpuid
260 * instruction, which is really slow (~70 cycles). In contrast,
261 * xchg is only about 23 cycles (plus a few per write buffer
262 * entry?). Still slow, but the best I can find. -KHR
263 *
264 * Bradley also timed "mfence", and on a Pentium IV xchgl is still quite a bit faster
265 * mfence appears to take about 125 ns on a 2.5GHZ P4
266 * xchgl apears to take about 90 ns on a 2.5GHZ P4
267 * However on an opteron, the performance of mfence and xchgl are both *MUCH MUCH BETTER*.
268 * mfence takes 8ns on a 1.5GHZ AMD64 (maybe this is an 801)
269 * sfence takes 5ns
270 * lfence takes 3ns
271 * xchgl takes 14ns
272 * see mfence-benchmark.c
273 */
274 int x=0, y;
275 __asm__ volatile ("xchgl %0,%1" :"=r" (x) :"m" (y), "0" (x) :"memory");
276 }
Ce qui m'a plu à ce sujet est le fait que xchgl semble être plus rapide :) mais vous devriez vraiment les mettre en œuvre et vérifier.
Qu'est-ce que 'lwsync' faire? –