Discussion:
Parallel processing with Kubuntu 10.04 woes
Patrick Connolly
2010-10-03 09:03:34 UTC
Permalink
My installation of Kubuntu 10.4 on a dual quad-core machine has the
impression it has 16 processors. When the same machine had Kubuntu
9.10, it did accurately detect that there were 8 processors.

When I run gkrellm, it dutifully draws 16 krells and when I get some
parallel work happening, say running 6 parallel tasks, 6 krells light
up for a time, then processing is rotated to another processor and a
different processor lights up.

It's a bit annoying cluttering up the screen with 16 krells when only
8 of them do anything, and I can't work out which 8 that is. (It's
definitely not simply the first 8, and I'm not sure it's always the
same 8). But what is more inconvenient, it seems as though the OS
passes the task on to another processor which it thinks is there and
while a non-existent processer has the task, nothing happens. So on
average, that's about half the time. The net effect is that unless
the task can be divided into more than 4 parallel tasks, there's no
advantage of the multiple processors at all. Even using 6 tasks, it's
only marginally faster than doing them all in sequence.

My guess is that if I could find a configuration file that I could
directly edit, I might be able to tell Kubuntu how to count.

Is that likely, or is there a more intelligent way to go about it?

(More details as to just what I'm parallelizing can be supplied if
it's needed.)

TIA
--
___ Patrick Connolly
{~._.~}
_( Y )_ Good judgment comes from experience
(:_~*~_:) Experience comes from bad judgment
(_)-(_)


_______________________________________________
NZLUG mailing list ***@linux.net.nz
http://www.linux.net.nz/cgi-bin/mailman/listinfo/nzlug
Robin Sheat
2010-10-03 09:18:53 UTC
Permalink
Post by Patrick Connolly
My installation of Kubuntu 10.4 on a dual quad-core machine has the
impression it has 16 processors. When the same machine had Kubuntu
9.10, it did accurately detect that there were 8 processors.
Are you sure it's not showing hyperthreading differently between the different
versions? Or, it's activating hyperthreading when it hadn't previously. Often,
that can be turned off in the BIOS if that's what it is. Although, it will
give you a bit of a speed boost in many workloads, so aside from visual
clutter, it may be an advantage.

More definitively, what does /proc/cpuinfo say?

Robin.
Dagan McGregor
2010-10-03 10:23:01 UTC
Permalink
Post by Robin Sheat
Post by Patrick Connolly
My installation of Kubuntu 10.4 on a dual quad-core machine has the
impression it has 16 processors.
More definitively, what does /proc/cpuinfo say?
Assuming /proc/cpuinfo shows the 'ht' processor flag, I assume Kubuntu
kernels enable hyperthreading where previously it was disabled.

You could try building your own kernel, without hyperthreading support,
if it is a problem.

Also check if there is a BIOS option to disable hyperthreading.

Cheers,
Dagan


_______________________________________________
NZLUG mailing list ***@linux.net.nz
http://www.linux.net.nz/cgi-bin/mailman/listinfo/nzlug
Patrick Connolly
2010-10-04 08:27:42 UTC
Permalink
Somewhere about Sun, 03-Oct-2010 at 10:18PM +1300 (give or take), Robin Sheat wrote:

|> Op zondag 03 oktober 2010 22:03:34 schreef Patrick Connolly:
|> > My installation of Kubuntu 10.4 on a dual quad-core machine has the
|> > impression it has 16 processors. When the same machine had Kubuntu
|> > 9.10, it did accurately detect that there were 8 processors.
|>

|> Are you sure it's not showing hyperthreading differently between
|> the different versions? Or, it's activating hyperthreading when it
|> hadn't previously. Often, that can be turned off in the BIOS if

Thanks for alerting me to that possibility.

|> that's what it is. Although, it will give you a bit of a speed
|> boost in many workloads, so aside from visual clutter, it may be an
|> advantage.

I think, for the software I'm using, it's probably better without.


|>
|> More definitively, what does /proc/cpuinfo say?

It's rather long but I think this is the part (line breaks added):

flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx
rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology
nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3
cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt lahf_lm ida tpr_shadow vnmi
flexpriority ept vpid

And I see 'ht' is in there. More than likely it wasn't the case with
the previous version.

One thing that might explain some of the weirdness is the fact that
though all the "processor"s are listed in almost identical ways,
"processor" 8 has this:


model name : Intel(R) Xeon(R) CPU E5530 @ 2.40GHz
stepping : 5
cpu MHz : 2395.000
cache size : 8192 KB
physical id : 0


Whereas all the other 15 have this:

model name : Intel(R) Xeon(R) CPU E5530 @ 2.40GHz
stepping : 5
cpu MHz : 1596.000
cache size : 8192 KB
physical id : 0


I'd have expected them all to be what 'processor' 8 says. It might
make sense to the educated, but not to me.

It does "work" for what I'm doing, but now I notice that after the
first time krells light up for each working "CPU", they nearly all
disappear from the gkrellm display which led me to believe that
everything had stopped working. Running top sometimes shows that
there are multiple instances of the sortware I'm parallelizing but
it's seldom as many as I've started.

The job does come to an end more or less successfully, but gkrellm
doesn't seem to know much about what's going on. That could be that
gkrellm can't cope with the hyperthreading. It didn't have the
problem with ver 9.10. I'd not realized that I'd come to depend on
gkrellm to give me an idea of what was happening during any particular
procedure.

When next I boot the machine, I'll see what I can do with the BIOS to
disable the hyperthreading. My work colleagues have Windoze 7 duel
booting on the same machine but it doesn't display 16 processors, but
I'm not all that surprised that it would think of things rather
differently.


My main remaining question is the difference between Processor 8 and
all the others. Does anyone have an explanation?

TIA
--
___ Patrick Connolly
{~._.~}
_( Y )_ Good judgment comes from experience
(:_~*~_:) Experience comes from bad judgment
(_)-(_)


_______________________________________________
NZLUG mailing list ***@linux.net.nz
http://www.linux.net.nz/cgi-bin/mailman/listinfo/nzlug
Daniel Lawson
2010-10-04 18:56:48 UTC
Permalink
Post by Patrick Connolly
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx
rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology
nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3
cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt lahf_lm ida tpr_shadow vnmi
flexpriority ept vpid
And I see 'ht' is in there. More than likely it wasn't the case with
the previous version.
It's not clear - do you see 16 processor entries in /proc/cpuinfo? If
so, you definitely have HT turned on. If you only see 8, then there is
something else going wrong.

I was interested that you said that when your tasks are allocated to the
"extra" cpus, they appear to do no work. Even if both threads are fully
loaded I'd expect processes on both to be able to actually process.
Post by Patrick Connolly
One thing that might explain some of the weirdness is the fact that
though all the "processor"s are listed in almost identical ways,
stepping : 5
cpu MHz : 2395.000
cache size : 8192 KB
physical id : 0
stepping : 5
cpu MHz : 1596.000
cache size : 8192 KB
physical id : 0
I'd have expected them all to be what 'processor' 8 says. It might
make sense to the educated, but not to me.
It's doing per-core speed throttling. You should be able to disable this
in the BIOS, or you could just leave it. Recent enough distros (like
Ubuntu) will handle this fine, and will also deal with the Turboboost
option available in your processors (where the system can temporarily
overclock a core if others are idle)
Post by Patrick Connolly
It does "work" for what I'm doing, but now I notice that after the
first time krells light up for each working "CPU", they nearly all
disappear from the gkrellm display which led me to believe that
everything had stopped working. Running top sometimes shows that
there are multiple instances of the sortware I'm parallelizing but
it's seldom as many as I've started.
The job does come to an end more or less successfully, but gkrellm
doesn't seem to know much about what's going on. That could be that
gkrellm can't cope with the hyperthreading. It didn't have the
problem with ver 9.10. I'd not realized that I'd come to depend on
gkrellm to give me an idea of what was happening during any particular
procedure.
HT isn't magic - it just looks like more CPUs. Everything else is
abstracted. Gkrellm may well have an issue dealing with 16 CPUs however,
but I doubt very much it's anything specific to HT.
Post by Patrick Connolly
When next I boot the machine, I'll see what I can do with the BIOS to
disable the hyperthreading. My work colleagues have Windoze 7 duel
booting on the same machine but it doesn't display 16 processors, but
I'm not all that surprised that it would think of things rather
differently.
If HT is enabled on this system, windows will display it. It's likely
that you have it turned on in your BIOS, and he doesn't.
Post by Patrick Connolly
My main remaining question is the difference between Processor 8 and
all the others. Does anyone have an explanation?
See above.

As a general point, HT on Nehalem (Intel 55xx) and later CPUs is far,
far improved over the original Intel HT implementation on P4 processors.
I've seen a 60% performance increase by utilising all HT cores - not
100%, of course, but also significantly greater than the ~0% that was
typical on P4 HT. Your mileage *will* vary, depending on your workload,
but it may be worth your while investigating it more. Happy to point you
at some information about it and about how to do proper process
scheduling on Nehalem cpus.









_______________________________________________
NZLUG mailing list ***@linux.net.nz
http://www.linux.net.nz/cgi-bin/mailman/listinfo/nzlug
Patrick Connolly
2010-10-05 09:31:33 UTC
Permalink
Somewhere about Tue, 05-Oct-2010 at 07:56AM +1300 (give or take),
Daniel Lawson wrote:

[...]
Post by Daniel Lawson
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe
syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good
xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx est
tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt lahf_lm ida
tpr_shadow vnmi flexpriority ept vpid
And I see 'ht' is in there. More than likely it wasn't the case with
the previous version.
It's not clear - do you see 16 processor entries in /proc/cpuinfo? If
Yes, which is consistent with seeing 'ht' in the list of flags.
Post by Daniel Lawson
so, you definitely have HT turned on. If you only see 8, then there is
something else going wrong.
We don't have that problem, but we have something else.

[....]
Post by Daniel Lawson
I was interested that you said that when your tasks are allocated to
the "extra" cpus, they appear to do no work. Even if both threads
are fully loaded I'd expect processes on both to be able to actually
process.
They appear to indicate that nothing is happening judging by what
gkrellm and top show. But it's not as simple as some notional CPUs
being recognised and others not. Starting fresh, I'll see almost
straight away 6 krells will show 100% CPU activity for about 10
seconds, then some will shift to other CPUs, but then they nearly all
drop to something less than 5%. That seems to be more or less what
top shows also. However, the tasks do finish eventually.

I thought I'd try to do the same thing using MPI but it's similar in
many respects but takes about three times longer. There is one big
difference: One krell will continuously show almost 100% for the 'sys
time' line -- something that didn't happen before. It spends about 2
minutes on any one CPU compared with about 10 seconds for the user
time lines (when anything is visible there).

[...]
Post by Daniel Lawson
It's doing per-core speed throttling. You should be able to disable
So every core is at 2/3 throttle except half of 1? Curiouser and
curiouser.
Post by Daniel Lawson
this in the BIOS, or you could just leave it. Recent enough distros
(like Ubuntu) will handle this fine, and will also deal with the
Turboboost option available in your processors (where the system can
temporarily overclock a core if others are idle)
[...]
Post by Daniel Lawson
HT isn't magic - it just looks like more CPUs. Everything else is
abstracted. Gkrellm may well have an issue dealing with 16 CPUs however,
but I doubt very much it's anything specific to HT.
Then top would show 6 processes until they finish, but most of the
time, it doesn't. It might be the specific parallel implementations
I'm using. I get the impression that the parallelizing works, but
things go haywire when processing shifts from one "processor" to
another. It always seems to start what I'd call "properly", but it
soon peters out. It didn't do that when HT wasn't enabled.
Post by Daniel Lawson
When next I boot the machine, I'll see what I can do with the BIOS
to disable the hyperthreading. My work colleagues have Windoze 7
duel booting on the same machine but it doesn't display 16
processors, but I'm not all that surprised that it would think of
things rather differently.
If HT is enabled on this system, windows will display it. It's
likely that you have it turned on in your BIOS, and he doesn't.
It's the same machine, not another of the same model, so the BIOS must
be identical. There might be a grub setting that Linux uses that
switches it on. Is that possible? (I'm completely lost with grub2.)

[...]
Post by Daniel Lawson
....... Happy to point you at some information about it and about
how to do proper process scheduling on Nehalem cpus.
Thanks for the offer, but I don't work at that level. I write R code
which uses different packages, and it's the way that those packages do
the paralleling that's a bit beyond me.
--
___ Patrick Connolly
{~._.~}
_( Y )_ Good judgment comes from experience
(:_~*~_:) Experience comes from bad judgment
(_)-(_)


_______________________________________________
NZLUG mailing list ***@linux.net.nz
http://www.linux.net.nz/cgi-bin/mailman/listinfo/nzlug
Patrick Connolly
2010-10-07 04:46:03 UTC
Permalink
Somewhere about Sun, 03-Oct-2010 at 10:18PM +1300 (give or take), Robin Sheat wrote:


|> Op zondag 03 oktober 2010 22:03:34 schreef Patrick Connolly:
|> > My installation of Kubuntu 10.4 on a dual quad-core machine has
|> > the impression it has 16 processors. When the same machine had
|> > Kubuntu 9.10, it did accurately detect that there were 8
|> > processors.
|>
|> Are you sure it's not showing hyperthreading differently between
|> the different versions? Or, it's activating hyperthreading when it
|> hadn't previously. Often, that can be turned off in the BIOS if
|> that's what it is. Although, it will

Great. It made all the difference. I found where to switch it off in
the BIOS and my code started working properly again.

Thanks for the suggestion.


|> give you a bit of a speed boost in many workloads, so aside from visual
|> clutter, it may be an advantage.

Whatever is wonderful about hyperthreading, it doesn't suit my sort of
parallelizing.
--
___ Patrick Connolly
{~._.~}
_( Y )_ Good judgment comes from experience
(:_~*~_:) Experience comes from bad judgment
(_)-(_)


_______________________________________________
NZLUG mailing list ***@linux.net.nz
http://www.linux.net.nz/cgi-bin/mailman/listinfo/nzlug
Guy K. Kloss
2010-10-07 05:43:59 UTC
Permalink
Post by Patrick Connolly
Whatever is wonderful about hyperthreading, it doesn't suit my sort of
parallelizing.
Hehe, that used to be my opinion in the past. But then I read some article in
an IEEE publication on parallelisation. Was really interesting and a huge eye
opener. Anyway, hyperthreading can make much more sense for certain types of
problems, than for others. After all, there must be a reason why Sun built
their Sparc architecture CPUs with eight HT units per core.

If I'd just know where I've seen that damn article *exactly*, 'cause I
recently discussed this topic with someone, and I couldn't recall the exact
reasoning why HT does make sense and kick arse big time ... :-/

Guy
--
Guy K. Kloss
Institute of Information and Mathematical Sciences
Te Kura Pūtaiao o Mōhiohio me Pāngarau
Massey University, Albany (North Shore City, Auckland)
473 State Highway 17, Gate 1, Mailroom, Quad B Building
voice: +64 9 414-0800 ext. 9266 fax: +64 9 441-8181
***@massey.ac.nz http://www.massey.ac.nz/~gkloss
Patrick Connolly
2010-10-07 06:57:05 UTC
Permalink
Somewhere about Thu, 07-Oct-2010 at 06:43PM +1300 (give or take), Guy K. Kloss wrote:

|> On Thu, 07 Oct 2010 17:46:03 Patrick Connolly wrote:
|> > Whatever is wonderful about hyperthreading, it doesn't suit my sort of
|> > parallelizing.
|>

|> Hehe, that used to be my opinion in the past. But then I read some
|> article in an IEEE publication on parallelisation. Was really
|> interesting and a huge eye opener. Anyway, hyperthreading can make
|> much more sense for certain types of problems, than for
|> others. After all, there must be a reason why Sun built their Sparc
|> architecture CPUs with eight HT units per core.

Wonder if that reason has anything to do with why Oracle has now taken
them over?

In fairness to Sun, I remember more than 15 years ago they were
talking about the 'internet being the computer', a long time before
sufficient bandwidth and the term 'cloud computing' appeared.

|>
|> If I'd just know where I've seen that damn article *exactly*,
|> 'cause I recently discussed this topic with someone, and I couldn't
|> recall the exact reasoning why HT does make sense and kick arse big
|> time ... :-/
From what I was able to ascertain, it suits 3D graphics rendering but
not the sort of work I have which is very iterative. In my case,
switching it off was like letting the dog off the chain. I suppose it
would be possible to rewrite everything so that it suited HT, but
that's more than a bit beyond my capabilities.
--
___ Patrick Connolly
{~._.~}
_( Y )_ Good judgment comes from experience
(:_~*~_:) Experience comes from bad judgment
(_)-(_)


_______________________________________________
NZLUG mailing list ***@linux.net.nz
http://www.linux.net.nz/cgi-bin/mailman/listinfo/nzlug
Rob Connolly
2010-10-07 07:27:47 UTC
Permalink
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Post by Guy K. Kloss
Post by Patrick Connolly
Whatever is wonderful about hyperthreading, it doesn't suit my sort of
parallelizing.
Hehe, that used to be my opinion in the past. But then I read some article in
an IEEE publication on parallelisation. Was really interesting and a huge eye
opener. Anyway, hyperthreading can make much more sense for certain types of
problems, than for others. After all, there must be a reason why Sun built
their Sparc architecture CPUs with eight HT units per core.
If I'd just know where I've seen that damn article *exactly*, 'cause I
recently discussed this topic with someone, and I couldn't recall the exact
reasoning why HT does make sense and kick arse big time ... :-/
A simple explanation is that Hyper-threading provides two virtual cores
to the OS, by only duplicating the components necessary to maintain the
state of the process that is executing on each 'core' (basically the
registers, program counter, etc).

The neat thing comes around when your program needs to fetch a value
from main memory (i.e. quite often). Assuming this request misses all
the caches and goes all the way to main memory this could take around
100 CPU cycles.

On a non-HT processor the CPU will be blocked waiting for the data
(discounting any form of pre-fetching optimisation undertaken by the
compiler), but with HT, the main parts of the CPU are used to execute
the other thread (which may just block on main memory again, but heh).

This is an overly simplistic explanation, but it should kinda give you
the idea. Basically you should see some sort of HT speed up on any
program which is fairly memory intensive (so that it generates a fair
number of cache misses) and can be parallelised into at least two threads.

Cheers,

Rob
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJMrXZyAAoJEK/TIBDD4ULe4bUH/iS0OK5IFKBp6wd0b146oAwo
3BMVfdIjAAIYSDs2WAgrf4PuF2UjsKb2RuqSUdFfb/pdyPCgw9JLjp8wvzUOb4EW
7rxyqpYiV+3xXw2Xrz5jQE+RS0CXf1S9KJxL/gzj/JjiDO5JJkseFiFHTfLLfSo8
UXeoK8rK9/9F/tWUldHfmvKhAFSVH/6LwlBfJSKh1a2ciGX8EasNC4ZArSKMoZWk
jmbMKlw4qA+Cu1j799992dqKhV60Uh2smTbCCDjQAS2zfNmPpnyMvJDWkOpx1eK5
9+2h6Opue4ye+wt+FRpcLrGIXkU2urcFA5/5FK9sC3zWR5aeGZwvmCrXqWNBDkU=
=xdmn
-----END PGP SIGNATURE-----

_______________________________________________
NZLUG mailing list ***@linux.net.nz
http://www.linux.net.nz/cgi-bin/mailman/listinfo/nzlug
Guy K. Kloss
2010-10-07 21:21:47 UTC
Permalink
Post by Rob Connolly
A simple explanation is that Hyper-threading provides two virtual cores
to the OS, by only duplicating the components necessary to maintain the
state of the process that is executing on each 'core' (basically the
registers, program counter, etc).
The neat thing comes around when your program needs to fetch a value
from main memory (i.e. quite often). Assuming this request misses all
the caches and goes all the way to main memory this could take around
100 CPU cycles.
On a non-HT processor the CPU will be blocked waiting for the data
(discounting any form of pre-fetching optimisation undertaken by the
compiler), but with HT, the main parts of the CPU are used to execute
the other thread (which may just block on main memory again, but heh).
Yes, that was it. Now it dawns on me again ...

The reasoning went sort of as follows: Putting in further execution pipelines
for HT requires way less transistors (thus chip real estate) than full cores.
Also, in HT they don't need to (cumbersomely) save all registers, etc. when
switching from one HT pipeline to the other, so thread switching is much
faster. So for certain applications, it's way better to significantly bump up
performance by providing a higher number of HT pipelines than full blown
cores. Such an example would very often be servers. As Sun put lots of effort
into their virtualisation platform to aggregate many servers on one host, this
just makes sense. An OS (or in this case many of them) have to deal with many
processes. Each one commonly is only active for short periods of time, and the
switching becomes the overhead, not the execution. So for those purposes HT
makes a lot of sense. For parallel computation however full cores are usually
much more sensible.

Thx for sparking my mind.

Guy
--
Guy K. Kloss
Institute of Information and Mathematical Sciences
Te Kura Pūtaiao o Mōhiohio me Pāngarau
Massey University, Albany (North Shore City, Auckland)
473 State Highway 17, Gate 1, Mailroom, Quad B Building
voice: +64 9 414-0800 ext. 9266 fax: +64 9 441-8181
***@massey.ac.nz http://www.massey.ac.nz/~gkloss
Robin Sheat
2010-10-07 09:54:05 UTC
Permalink
Post by Patrick Connolly
Whatever is wonderful about hyperthreading, it doesn't suit my sort of
parallelizing.
I'd be somewhat surprised to see an overall slowdown as the result of
hyperthreading. My idea of how it works suggests that at the worst, you'd get
the speed of one core, and at the best, something like 180%. I don't think it
should ever be worse than a single core, unless there are some pathological
cases I don't know about.

Robin.
Patrick Connolly
2010-10-07 19:11:43 UTC
Permalink
Somewhere about Thu, 07-Oct-2010 at 10:54PM +1300 (give or take), Robin Sheat wrote:

|> Op donderdag 07 oktober 2010 17:46:03 schreef Patrick Connolly:
|> > Whatever is wonderful about hyperthreading, it doesn't suit my sort of
|> > parallelizing.
|>

|> I'd be somewhat surprised to see an overall slowdown as the result
|> of hyperthreading. My idea of how it works suggests that at the
|> worst, you'd get the speed of one core, and at the best, something
|> like 180%. I don't think it should ever be worse than a single
|> core, unless there are some pathological cases I don't know about.

I get the impression the problem lies in the changing from one
processor to another. It starts alright, but when a task goes to
another processor it's analogous to a slipping clutch: the engine's
revving, but there's nothing being driven. It's very slow to do
anything else with the 'spare' processors while in that state. With
HT off, it seems to be able to change gears effortlessly.

Why that would be happening, I've no idea.
--
___ Patrick Connolly
{~._.~}
_( Y )_ Good judgment comes from experience
(:_~*~_:) Experience comes from bad judgment
(_)-(_)


_______________________________________________
NZLUG mailing list ***@linux.net.nz
http://www.linux.net.nz/cgi-bin/mailman/listinfo/nzlug
Robin Sheat
2010-10-07 19:39:55 UTC
Permalink
Post by Patrick Connolly
I get the impression the problem lies in the changing from one
processor to another.
In general, that's something that should be avoided anyway. There are tools to
set CPU affinity: you can force a process to be on a particular CPU all the
time. Migrating them can be fairly expensive, as cache is usually local to the
cores that are on a single chip.

Robin.
Daniel Pittman
2010-10-08 09:47:50 UTC
Permalink
This is sent while on the road, ironically in NZ, so may be a little delayed
by finding Internet access; sorry if the content is no longer relevant to the
discussion.
Post by Robin Sheat
I get the impression the problem lies in the changing from one processor to
another.
On a pair of HT cores that will make approximately zero difference in the real
world: your cores share L2 cache, and even mostly share L1 cache, so there is
pretty much no cost to moving threads between them.

This is what you would reasonably expect when you are creating the second
"core" from the same execution units as the first core.

The only thing that /might/ slow you down in that situation is additional
cache pressure caused by the OS scheduling other activity to the second
thread, which means that data may get evicted from the L1 caches more often.

OTOH, L2 and L3 cache is large, fast, and close to the CPU, so most code will
not really notice the difference.
Post by Robin Sheat
In general, that's something that should be avoided anyway. There are tools
Only extremely rare code running under Linux will benefit from that, because...
Post by Robin Sheat
you can force a process to be on a particular CPU all the time. Migrating
them can be fairly expensive, as cache is usually local to the cores that
are on a single chip.
... the Linux scheduler is well aware of the performance characteristics of
moving execution between HyperThreads, cores on the same socket (only L3 cache
is shared, but data transfer from socket-siblings is faster), and separate
sockets.

For almost all use you should allow the scheduler to deal with the issue of
where to run your code, and it will make reasonable efforts to keep you
working cache-hot.

Tuning is most commonly required if you have soft-RT scheduling requirements,
or for the occasional semi-pathalogical[1] situation like two cores and three
threads running flat-out.

Daniel

Footnotes:
[1] I can't think of a better way to put this. It isn't entirely
disastrous, but there really isn't much *good* solution to the situation
unless you can read minds.
--
✣ Daniel Pittman ✉ ***@rimspace.net ☎ +61 401 155 707
♽ made with 100 percent post-consumer electrons

_______________________________________________
NZLUG mailing list ***@linux.net.nz
http://www.linux.net.nz/cgi-bin/mailman/listinfo/nzlug
Loading...