Discussion:
oh-no! kernel bug! freeze!
Atom Smasher
2011-03-29 09:34:38 UTC
Permalink
i'm a veteran of freebsd, and a lot of people think i'm a linux guru, but
i'm not sure what's going on here...

screen-shot from a console - Loading Image...

luckily, that mess on the console includes the output from rsync up until
it went belly-up, so everything related to the problem is there.

so... i just installed ubuntu-10.04.2-alternate-amd64 on a lenovo T510.
installation went smooth and i was able to boot into a usable system and
install all updates as of this afternoon. then 2-3 times it froze, and
magic SysRq REISUB had no effect; i had to hold the power button. then,
after rebooting this evening, i just switched to a console and in the
middle of rsync-ing from another machine i've got "BUG: unable to handle
kernel NULL pointer dereference".

of course even without the GUI it still didn't respond to REISUB. it's
stone-dead.

since it's been running well until it crashes, i'm reluctant to just do a
reinstall. i guess i'll run memtest overnight...

any ideas??? thanks...
--
...atom

________________________
http://atom.smasher.org/
762A 3B98 A3C3 96C9 C6B7 582A B88D 52E4 D9F5 7808
-------------------------------------------------

"If someone created a database of all primes, won't he be
able to use that database to break public-key algorithms?
Yes, but he can't do it. If you could store one gigabyte
of information on a drive weighing one gram, then a list
of just the 512-bit primes would weigh so much that it
would exceed the Chandrasekhar limit and collapse into a
black hole... so you couldn't retrieve the data anyway"
-- Bruce Schneier, Applied Cryptography


_______________________________________________
NZLUG mailing list ***@linux.net.nz
http://www.linux.net.nz/cgi-bin/mailman/listinfo/nzlug
Aidan Gauland
2011-03-29 10:00:52 UTC
Permalink
Post by Atom Smasher
i'm a veteran of freebsd, and a lot of people think i'm a linux
guru, but i'm not sure what's going on here...
screen-shot from a console - http://smasher.org/tmp/IMG_0546.JPG
luckily, that mess on the console includes the output from rsync up
until it went belly-up, so everything related to the problem is
there.
so... i just installed ubuntu-10.04.2-alternate-amd64 on a lenovo
T510. installation went smooth and i was able to boot into a usable
system and install all updates as of this afternoon. then 2-3 times
it froze, and magic SysRq REISUB had no effect; i had to hold the
power button. then, after rebooting this evening, i just switched to
a console and in the middle of rsync-ing from another machine i've
got "BUG: unable to handle kernel NULL pointer dereference".
of course even without the GUI it still didn't respond to REISUB.
it's stone-dead.
since it's been running well until it crashes, i'm reluctant to just
do a reinstall. i guess i'll run memtest overnight...
any ideas??? thanks...
Last time I had a machine go haywire on me, I tried running several
diffeent Live CDs. Preferably running different kernels. For
example, trying Ubuntu and KUbuntu aren't any different at the core,
so try, say, Ubuntu, Puppy, Knoppix, Slax, Finnix, and grml. If they
all crash, I'd say you almost certainly have a serious hardware
problem. Running a memtest is a good idea, though.

Hopefully someone else will be able to make sense of that kernel.

HTH,
Aidan Gauland
Atom Smasher
2011-03-29 10:11:15 UTC
Permalink
Post by Aidan Gauland
Last time I had a machine go haywire on me, I tried running several
diffeent Live CDs. Preferably running different kernels. For example,
trying Ubuntu and KUbuntu aren't any different at the core, so try, say,
Ubuntu, Puppy, Knoppix, Slax, Finnix, and grml. If they all crash, I'd
say you almost certainly have a serious hardware problem. Running a
memtest is a good idea, though.
=================

for the last few days i'd been running freebsd on it; no crashes.
--
...atom

________________________
http://atom.smasher.org/
762A 3B98 A3C3 96C9 C6B7 582A B88D 52E4 D9F5 7808
-------------------------------------------------

"Just because something makes perfect sense doesn't mean
it is true."
-- Rory Miller, Meditations on Violence

Corollary: Just because something doesn't make any sense
doesn't mean it isn't true.
-- Atom Smasher


_______________________________________________
NZLUG mailing list ***@linux.net.nz
http://www.linux.net.nz/cgi-bin/mailman/listinfo/nzlug
James Clark
2011-03-29 10:10:39 UTC
Permalink
Post by Atom Smasher
screen-shot from a console - http://smasher.org/tmp/IMG_0546.JPG
luckily, that mess on the console includes the output from rsync up until it went belly-up, so everything related to the problem is there.
Looks very similar to this:

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/681753

Fault is at the same point: mem_cgroup_del_lru_list+0x67

The active task in your case is kswapd, and I see shrink_zone in the stack trace. My guess would be that memory pressure is a trigger for this problem,

It doesn't look like Ubuntu are active on this - and LTS doesn't mean anything in my experience. If this problem repeats then I'd try updating to a later release - unless you want to spend time chasing this bug down.

-Jamie
_______________________________________________
NZLUG mailing list ***@linux.net.nz
http://www.linux.net.nz/cgi-bin/mailman/listinfo/nzlug
Atom Smasher
2011-03-29 10:28:27 UTC
Permalink
Post by James Clark
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/681753
Fault is at the same point: mem_cgroup_del_lru_list+0x67
============

indeed...
Post by James Clark
It doesn't look like Ubuntu are active on this - and LTS doesn't mean
anything in my experience. If this problem repeats then I'd try updating
to a later release - unless you want to spend time chasing this bug
down.
===============

the problem seems to be repeating within an hour or two of uptime.

where/how should this bug be officially reported?

is 10.10 running a different kernel than 10.04.2? or should i try a
PREEMPT kernel?

there's no time pressure to get this machine running, so i can run
diagnostics if it'll help someone else chase the bug.
--
...atom

________________________
http://atom.smasher.org/
762A 3B98 A3C3 96C9 C6B7 582A B88D 52E4 D9F5 7808
-------------------------------------------------

"If you take out the killings, Washington
actually has a very very low crime rate."
-- M. Barry, Mayor of Washington, DC


_______________________________________________
NZLUG mailing list ***@linux.net.nz
http://www.linux.net.nz/cgi-bin/mailman/listinfo/nzlug
Nevyn
2011-03-29 10:48:50 UTC
Permalink
is 10.10 running a different kernel than 10.04.2? or should i try a PREEMPT
kernel?
I think 10.04 is currently using 2.6.32? There's a release candidate
for 2.6.35 which I found to be lacking a bunch of modules. For
example, for a particular laptop I was working on I needed 2.6.35 for
the sound to work properly (the microphone) but couldn't attempt
ndiswrapper as the module was missing.

I just found these instructions for a backport (for lucid?):
sudo add-apt-repository ppa:kernel-ppa/ppa
sudo apt-get update
sudo apt-get install linux-lts-backport-maverick

I think I had downloaded the deb's individual off the repository and
installed using dpkg rather than doing this method so you might get
better mileage if you wanted to try going down this track.

Regards,
Nevyn
http://nevsramblings.blogspot.com/

_______________________________________________
NZLUG mailing list ***@linux.net.nz
http://www.linux.net.nz/cgi-bin/mailman/listinfo/nzlug
James Clark
2011-03-29 14:20:12 UTC
Permalink
Post by Atom Smasher
Post by James Clark
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/681753
Fault is at the same point: mem_cgroup_del_lru_list+0x67
[snip]
Post by Atom Smasher
the problem seems to be repeating within an hour or two of uptime.
where/how should this bug be officially reported?
My guess would be launchpad. The similar bug above (681753) is incomplete/undecided/unassigned. You could update that one or file a new, related one.
Post by Atom Smasher
is 10.10 running a different kernel than 10.04.2? or should i try a PREEMPT kernel?
10.10 is likely to be a couple of releases later. If you are coming from FreeBSD (a fine OS btw) then spending some time sifting through the Ubuntu mailing lists / forums / launchpad would be worthwhile. Some regular Ubuntu users will shine more light here.
Post by Atom Smasher
there's no time pressure to get this machine running, so i can run diagnostics if it'll help someone else chase the bug.
This is where I'll grumble about Ubuntu. I have found that the Ubuntu developers are really only interested in 'hot' bugs. LTS releases are not particularly important. Often the response to LTS bugs ends with a suggestion to upgrade. This negates the point of LTS somewhat.

The mainline kernel developers will *probably* not be interested in oopses in the Ubuntu kernel unless they also appear in mainline. Ubuntu roll a lot of things into their kernels and many bugs like this are build-specific.

If you want to help chase the bug then I'd first sift through kernel.org starting from the release in question. If nothing similar comes up there then this is probably Ubuntu-specific and you'll need to drop back to launchpad to seek resolution.

As others suggest: Try a later or different release before spending too much time on this. Chances are it's fixed.
_______________________________________________
NZLUG mailing list ***@linux.net.nz
http://www.linux.net.nz/cgi-bin/mailman/listinfo/nzlug
Cliff Pratt
2011-03-31 07:19:13 UTC
Permalink
Post by Atom Smasher
i'm a veteran of freebsd, and a lot of people think i'm a linux guru,
but i'm not sure what's going on here...
screen-shot from a console - http://smasher.org/tmp/IMG_0546.JPG
luckily, that mess on the console includes the output from rsync up
until it went belly-up, so everything related to the problem is there.
so... i just installed ubuntu-10.04.2-alternate-amd64 on a lenovo T510.
installation went smooth and i was able to boot into a usable system and
install all updates as of this afternoon. then 2-3 times it froze, and
magic SysRq REISUB had no effect; i had to hold the power button. then,
after rebooting this evening, i just switched to a console and in the
middle of rsync-ing from another machine i've got "BUG: unable to handle
kernel NULL pointer dereference".
of course even without the GUI it still didn't respond to REISUB. it's
stone-dead.
since it's been running well until it crashes, i'm reluctant to just do
a reinstall. i guess i'll run memtest overnight...
any ideas??? thanks...
Bugger! I've seen rsync die nastily before, but never has it taken down
the machine. I'd guess memory problems, as you suggest.

Cheers,

Cliff

_______________________________________________
NZLUG mailing list ***@linux.net.nz
http://www.linux.net.nz/cgi-bin/mailman/listinfo/nzlug
Atom Smasher
2011-03-31 07:41:44 UTC
Permalink
Post by Cliff Pratt
Bugger! I've seen rsync die nastily before, but never has it taken down
the machine. I'd guess memory problems, as you suggest.
====================

memtest: 18 pass - 0 fail

with 10.04.2 i booted into the 2.6.32.28-generic kernel (instead of the
2.6.32-30-generic that was freezing), recovery mode, single-user and ran
"do-release-upgrade". everything went well.

now (without a full reinstall) it's running ubuntu-10.10 with
2.6.35-28-generic kernel.

with compiz it froze after about an hour, but i couldn't switch to a
console to see what happened. no REISUB. deja-vu.

right now it's under a light load on the console for 12+ hours.

i'm kind of hoping that it does freeze again, so i can take another
picture and file a bug report. on this hardware i've been having much less
problems with ubuntu-studio running a preempt kernel, so i might try that
if generic is causing problems.

for now, it's frustrating that x/compiz crashed quickly, but the console
is just chugging along.
--
...atom

________________________
http://atom.smasher.org/
762A 3B98 A3C3 96C9 C6B7 582A B88D 52E4 D9F5 7808
-------------------------------------------------

"The livestock sector is a major player [in climate
change], responsible for 18% of greenhouse gas
emissions measured in CO2 equivalent. This is a higher
share than transport."
-- Livestock's long shadow, 2006
UN report sponsored by WTO, EU, AS-AID, FAO, et al


_______________________________________________
NZLUG mailing list ***@linux.net.nz
http://www.linux.net.nz/cgi-bin/mailman/listinfo/nzlug
Bruce Clement
2011-03-31 07:55:29 UTC
Permalink
I'm not sure how freeBSD allocates "unused" memory, but Linux has a design
philosophy of "unused memory is wasted memory" and tends to aggressively
allocate any as disk buffers (Although some distributions set their kernels
not to do this http://aplawrence.com/Linux/memory_tuning.html ). I
understand that this was not the behaviour of AT&T Unix & *BSD is in general
a lot closer to AT&T Unix behaviour than Linux.

File copy programs like rsync probably ends up allocating large numbers of
disk buffers & if you have any bad memory in there it will eventually put
something important in the bad memory so it's always a good place to look
first for problems.

On Thu, Mar 31, 2011 at 8:41 PM, Atom Smasher <***@smasher.org> wrote:

memtest: 18 pass - 0 fail
On the other hand, this looks like your memory is OK.
Post by Atom Smasher
with 10.04.2 i booted into the 2.6.32.28-generic kernel (instead of the
2.6.32-30-generic that was freezing), recovery mode, single-user and ran
"do-release-upgrade". everything went well.
now (without a full reinstall) it's running ubuntu-10.10 with
2.6.35-28-generic kernel.
with compiz it froze after about an hour, but i couldn't switch to a
console to see what happened. no REISUB. deja-vu.
right now it's under a light load on the console for 12+ hours.
i'm kind of hoping that it does freeze again, so i can take another picture
and file a bug report. on this hardware i've been having much less problems
with ubuntu-studio running a preempt kernel, so i might try that if generic
is causing problems.
for now, it's frustrating that x/compiz crashed quickly, but the console is
just chugging along.
This sort of thing is what really pees me off about Linux ... it seems
there's a new kernel every second or third day, usually because the driver
for some obscure piece of hardware no-one has seen in the wild for 5 years
had a security fix. Which is why I keep thinking about giving freeBSD
another try.

Something in your description makes me wonder if there isn't something wrong
with your graphics card ... memory OK, crashes when graphics is turned on.
--
Bruce Clement

Home: http://www.clement.co.nz/
Twitter: http://twitter.com/Bruce_Clement
Directory: http://www.searchme.co.nz/

"Before attempting to create something new, it is vital to have a good
appreciation of everything that already exists in this field." Mikhail
Kalashnikov
_______________________________________________
NZLUG mailing list ***@linux.net.nz
http://www.linux.net.nz/cgi-bin/mailman/listinfo/nzlug
Atom Smasher
2011-03-31 08:15:32 UTC
Permalink
Post by Bruce Clement
This sort of thing is what really pees me off about Linux ... it seems
there's a new kernel every second or third day, usually because the
driver for some obscure piece of hardware no-one has seen in the wild
for 5 years had a security fix. Which is why I keep thinking about
giving freeBSD another try.
Something in your description makes me wonder if there isn't something
wrong with your graphics card ... memory OK, crashes when graphics is
turned on.
=======================

well, with ubuntu-10.04.2 it crashed within an hour or two even on the
console.

i'm partial towards freebsd, and have been for 10+ years, but having a
crisis of faith... this hardware is about a year old and freebsd-8.2
doesn't have kernel support for the intel video; it can fall back to the
vesa driver but then it won't support the resolution or hardware
acceleration that the hardware can handle.

the kernel support for this intel graphics chipset "may" be in freebsd-9
which is probably 6-12 months away :( until then, i can either let a
reasonably new laptop collect dust, or put ubuntu on it and put it to good
use.
--
...atom

________________________
http://atom.smasher.org/
762A 3B98 A3C3 96C9 C6B7 582A B88D 52E4 D9F5 7808
-------------------------------------------------

"The future isn't what it used to be."
-- Yogi Berra


_______________________________________________
NZLUG mailing list ***@linux.net.nz
http://www.linux.net.nz/cgi-bin/mailman/listinfo/nzlug
Martin D Kealey
2011-04-01 13:20:00 UTC
Permalink
Post by Atom Smasher
well, with ubuntu-10.04.2 it crashed within an hour or two even on the
console.
I'm wondering if modern kernels are using FrameBuffer even in "text" mode...
In fact even in text mode, you still "use the graphics card"; it has fonts
and whatnot loaded, and keeps a scroll-back with logs of text in it.

-Martin

_______________________________________________
NZLUG mailing list ***@linux.net.nz
http://www.linux.net.nz/cgi-bin/mailman/listinfo/nzlug
Atom Smasher
2011-03-31 08:50:13 UTC
Permalink
so, not long after my last email... CRASH!

Loading Image...

unresponsive to switching console terminals, REISUB, ^C, paging-up, etc
but seemed to be updating the display about every 91 seconds.

Ubuntu 10.10, 2.6.35-28-generic

LENOVO, 4313CTO, ThinkPad T510, Intel(R) Core(TM) i5 CPU M 540 @ 2.53GHz.

very different from the last error, but not at all encouraging...

any ideas....?
--
...atom

________________________
http://atom.smasher.org/
762A 3B98 A3C3 96C9 C6B7 582A B88D 52E4 D9F5 7808
-------------------------------------------------

"When a man calls an animal vicious that usually means
it will try to protect itself when he tries to kill it."
-- Rick McIntyre, "A Society of Wolves"


_______________________________________________
NZLUG mailing list ***@linux.net.nz
http://www.linux.net.nz/cgi-bin/mailman/listinfo/nzlug
Nevyn
2011-03-31 09:21:28 UTC
Permalink
Post by Atom Smasher
so, not long after my last email... CRASH!
http://smasher.org/tmp/IMG_0550.JPG
unresponsive to switching console terminals, REISUB, ^C, paging-up, etc but
seemed to be updating the display about every 91 seconds.
Ubuntu 10.10, 2.6.35-28-generic
very different from the last error, but not at all encouraging...
any ideas....?
Some suggestions which I've found - possibly a hard drive fault.
Though in your case, I think this is quite unlikely. Can you try an
older version of Ubuntu and see if the error is replicable?

If it's not, then I'd suspect something going on with the module? for
your hard drive controller or dram controller.

Another thing to try (mostly from looking bits and pieces up on the net):
Append notsc to the kernel line (When booting up, hold down the left
shit key to get into the grub menu.)

Regards,
Nevyn
http://nevsramblings.blogspot.com/

_______________________________________________
NZLUG mailing list ***@linux.net.nz
http://www.linux.net.nz/cgi-bin/mailman/listinfo/nzlug
Nevyn
2011-03-31 09:28:53 UTC
Permalink
Post by Nevyn
Post by Atom Smasher
so, not long after my last email... CRASH!
http://smasher.org/tmp/IMG_0550.JPG
unresponsive to switching console terminals, REISUB, ^C, paging-up, etc but
seemed to be updating the display about every 91 seconds.
Ubuntu 10.10, 2.6.35-28-generic
very different from the last error, but not at all encouraging...
any ideas....?
Some suggestions which I've found - possibly a hard drive fault.
Though in your case, I think this is quite unlikely. Can you try an
older version of Ubuntu and see if the error is replicable?
If it's not, then I'd suspect something going on with the module? for
your hard drive controller or dram controller.
Append notsc to the kernel line (When booting up, hold down the left
shit key to get into the grub menu.)
The BIOS may also play a role here. Try running:
sudo dmidecode -s bios-version

and checking the results against what's avaliable from Levono...

Regards,
Nevyn
http://nevsramblings.blogspot.com/

_______________________________________________
NZLUG mailing list ***@linux.net.nz
http://www.linux.net.nz/cgi-bin/mailman/listinfo/nzlug
Glenn Enright
2011-03-31 09:29:05 UTC
Permalink
That log suggests a hdd issue. Install smartmontools package and check
if the drive is reporting any problems?

Also keep an eye on the machine temperature. If its load causing this
then that will make the lappy warm up. Which can lead too unexpected
crashes...
Post by Atom Smasher
so, not long after my last email... CRASH!
http://smasher.org/tmp/IMG_0550.JPG
unresponsive to switching console terminals, REISUB, ^C, paging-up, etc but
seemed to be updating the display about every 91 seconds.
Ubuntu 10.10, 2.6.35-28-generic
very different from the last error, but not at all encouraging...
any ideas....?
--
       ...atom
 ________________________
 http://atom.smasher.org/
 762A 3B98 A3C3 96C9 C6B7 582A B88D 52E4 D9F5 7808
 -------------------------------------------------
       "When a man calls an animal vicious that usually means
        it will try to protect itself when he tries to kill it."
               -- Rick McIntyre, "A Society of Wolves"
_______________________________________________
http://www.linux.net.nz/cgi-bin/mailman/listinfo/nzlug
_______________________________________________
NZLUG mailing list ***@linux.net.nz
http://www.linux.net.nz/cgi-bin/mailman/listinfo/nzlug
Nevyn
2011-03-31 09:34:44 UTC
Permalink
Post by Glenn Enright
That log suggests a hdd issue. Install smartmontools package and check
if the drive is reporting any problems?
Also keep an eye on the machine temperature. If its load causing this
then that will make the lappy warm up. Which can lead too unexpected
crashes...
That'd be something interesting to look at. You've looked at X under
load and terminal not under load... What about the terminal under
load? An rsync from the terminal?

Regards,
Nevyn
http://nevsramblings.blogspot.com/

_______________________________________________
NZLUG mailing list ***@linux.net.nz
http://www.linux.net.nz/cgi-bin/mailman/listinfo/nzlug
Atom Smasher
2011-03-31 10:22:36 UTC
Permalink
thanks to all for those suggestion, which i'll be looking into shortly.
Post by Nevyn
That'd be something interesting to look at. You've looked at X under
load and terminal not under load... What about the terminal under load?
An rsync from the terminal?
=================

in the last hour or two, it froze twice while running rsync on the console
and didn't send any errors to the console... just a complete and total
lockup.
--
...atom

________________________
http://atom.smasher.org/
762A 3B98 A3C3 96C9 C6B7 582A B88D 52E4 D9F5 7808
-------------------------------------------------

"The only normal people are
those you don't know very well!"
-- Joe Ancis


_______________________________________________
NZLUG mailing list ***@linux.net.nz
http://www.linux.net.nz/cgi-bin/mailman/listinfo/nzlug
Glenn Enright
2011-03-31 09:32:11 UTC
Permalink
That log suggests a hdd issue. Install smartmontools package and check
if the drive is reporting any problems?

Also keep an eye on the machine temperature. If its load causing this
then that will make the lappy warm up. Which can lead too unexpected
crashes...
Post by Atom Smasher
so, not long after my last email... CRASH!
http://smasher.org/tmp/IMG_0550.JPG
unresponsive to switching console terminals, REISUB, ^C, paging-up, etc but
seemed to be updating the display about every 91 seconds.
Ubuntu 10.10, 2.6.35-28-generic
very different from the last error, but not at all encouraging...
any ideas....?
--
       ...atom
 ________________________
 http://atom.smasher.org/
 762A 3B98 A3C3 96C9 C6B7 582A B88D 52E4 D9F5 7808
 -------------------------------------------------
       "When a man calls an animal vicious that usually means
        it will try to protect itself when he tries to kill it."
               -- Rick McIntyre, "A Society of Wolves"
_______________________________________________
http://www.linux.net.nz/cgi-bin/mailman/listinfo/nzlug
_______________________________________________
NZLUG mailing list ***@linux.net.nz
http://www.linux.net.nz/cgi-bin/mailman/listinfo/nzlug
Atom Smasher
2011-03-31 11:26:33 UTC
Permalink
The BIOS may also play a role here. Try running: sudo dmidecode -s
bios-version
and checking the results against what's avaliable from Levono...
=================

turns out there's a BIOS update that's almost a week old. i didn't even
check what was in there; it's older than a week.

we'll see if that does it... that would be convenient.
That log suggests a hdd issue. Install smartmontools package and check
if the drive is reporting any problems?
==================

SMART overall-health self-assessment test result: PASSED

"smartctl -A /dev/sda" looks good. particularly, these are all zero:
Raw_Read_Error_Rate
Reallocated_Sector_Ct
Seek_Error_Rate
Current_Pending_Sector
Offline_Uncorrectable
UDMA_CRC_Error_Count
Multi_Zone_Error_Rate
Also keep an eye on the machine temperature. If its load causing this
then that will make the lappy warm up. Which can lead too unexpected
crashes...
======================

system load is right around 1.00 which isn't much for an i5 CPU.

i'm running this via ssh:
while :
do
echo $( date +%T ) - $( cut -d ' ' -f 1-3 < /proc/loadavg ) - $( cat /proc/acpi/thermal_zone/THM0/temperature )
sleep 10
done

output from that looks like this:
00:24:13 - 1.05 1.04 0.96 - temperature: 37 C
00:24:23 - 1.05 1.04 0.96 - temperature: 38 C
00:24:33 - 1.11 1.05 0.97 - temperature: 39 C
00:24:43 - 1.10 1.05 0.97 - temperature: 39 C

so whatever happens with temperature and load, i'll see it on another
computer after this one crashes. and i'll know when it crashed (i'll be
alseep soon) and how the temp & load is trending prior to crash.

30 minutes of rsync over LAN (that could run all night), system load near
1.00, and it's staying 35-40C, so far.
--
...atom

________________________
http://atom.smasher.org/
762A 3B98 A3C3 96C9 C6B7 582A B88D 52E4 D9F5 7808
-------------------------------------------------

"The proper, wise balancing of one's whole life may depend
upon the feasibility of a cup of tea at an unusual hour."
-- Arnold Bennett, How to Live on 24 Hours a Day


_______________________________________________
NZLUG mailing list ***@linux.net.nz
http://www.linux.net.nz/cgi-bin/mailman/listinfo/nzlug
Atom Smasher
2011-03-31 20:33:42 UTC
Permalink
as i suspected....

1) it crashed shortly after i went to bed.
2) it doesn't seem correlated to load or temperature.

for now it's running (poorly) the flux screensaver from an ubuntu-8.10
i386 live-CD.
--
...atom

________________________
http://atom.smasher.org/
762A 3B98 A3C3 96C9 C6B7 582A B88D 52E4 D9F5 7808
-------------------------------------------------

"The thing that bugs me is that the people think the
FDA (Food and Drug Administration) is protecting them.
It isn't. What the FDA is doing and what the public
thinks it's doing are as different as night and day."
-- Dr Ley, former Commissioner of the FDA


_______________________________________________
NZLUG mailing list ***@linux.net.nz
http://www.linux.net.nz/cgi-bin/mailman/listinfo/nzlug
Volker Kuhlmann
2011-04-01 10:13:48 UTC
Permalink
Post by Atom Smasher
as i suspected....
1) it crashed shortly after i went to bed.
2) it doesn't seem correlated to load or temperature.
Most what you've said so far points to flaky hardware. Your raft of I/O
errors point more towards the path between disk and CPU (cables,
connectors, disk interface) and not to the disk platter (confirmed by
your smartmontools output). Your comments also suggest the problem
involves the graphics hardware - which is most likely memory mapped,
therefore not so easy to pinpoint.

There's still some chance though that it's a race condition in one of
your hardware drivers. You prove this by stress-testing a different
kernel as well as kernel version. For this purpose, Debian, buntu and
all its derivatives are equivalent so you have to use something else. If
you're keen test the absolute latest as well as a 2 year old one. Good
luck...

If you can't prove a software fault you have a dodgy piece of hardware.
The only fix is a replacement. Good luck there too...

And re-consider your $SUBJECT. You know how it is: If your doze crashes,
get Linux, if your Linux crashes, get your hardware fixed. It was true
10 years ago and that hasn't really changed.

Volker
--
Volker Kuhlmann is list0570 with the domain in header.
http://volker.dnsalias.net/ Please do not CC list postings to me.

_______________________________________________
NZLUG mailing list ***@linux.net.nz
http://www.linux.net.nz/cgi-bin/mailman/listinfo/nzlug
Cliff Pratt
2011-04-01 22:58:22 UTC
Permalink
Post by Bruce Clement
This sort of thing is what really pees me off about Linux ... it seems
there's a new kernel every second or third day, usually because the driver
for some obscure piece of hardware no-one has seen in the wild for 5 years
had a security fix. Which is why I keep thinking about giving freeBSD
another try.
I'd say that that is mostly hyperbole. There's been a few kernel updates
on Ubuntu recently, but sometimes you can go a month or two without one.
Drivers should be in modules and shouldn't necessitate a kernel upgrade
anyway.

Cheers,

Cliff

_______________________________________________
NZLUG mailing list ***@linux.net.nz
http://www.linux.net.nz/cgi-bin/mailman/listinfo/nzlug
Cliff Pratt
2011-04-01 22:55:25 UTC
Permalink
Post by Atom Smasher
Post by Cliff Pratt
Bugger! I've seen rsync die nastily before, but never has it taken
down the machine. I'd guess memory problems, as you suggest.
====================
memtest: 18 pass - 0 fail
with 10.04.2 i booted into the 2.6.32.28-generic kernel (instead of the
2.6.32-30-generic that was freezing), recovery mode, single-user and ran
"do-release-upgrade". everything went well.
now (without a full reinstall) it's running ubuntu-10.10 with
2.6.35-28-generic kernel.
with compiz it froze after about an hour, but i couldn't switch to a
console to see what happened. no REISUB. deja-vu.
right now it's under a light load on the console for 12+ hours.
i'm kind of hoping that it does freeze again, so i can take another
picture and file a bug report. on this hardware i've been having much
less problems with ubuntu-studio running a preempt kernel, so i might
try that if generic is causing problems.
for now, it's frustrating that x/compiz crashed quickly, but the console
is just chugging along.
I wonder why some people have problems like this. I've been using Ubuntu
continuously for several years and I've not hit anything like this.

Cheers,

Cliff

_______________________________________________
NZLUG mailing list ***@linux.net.nz
http://www.linux.net.nz/cgi-bin/mailman/listinfo/nzlug
Atom Smasher
2011-04-02 02:08:59 UTC
Permalink
Post by Cliff Pratt
I wonder why some people have problems like this. I've been using Ubuntu
continuously for several years and I've not hit anything like this.
==========

i've been helping others with ubuntu for years, and never seen anything
like this.

i was wondering if the problem was related to tweaking some things when i
installed from the alternate-install image: so i re-installed using the
idiot-resistant desktop image point-n-click install; upgraded from 10.04
to 10.10; and i'm still not getting an hour of uptime before it locks-up.

next i'm going to try installing ubuntu-studio (real-time kernel). i've
had good luck running 2.6.32-30-preempt (and earlier) on the same
hardware, so if that doesn't work out i'll start seriously suspecting the
hardware. i'll also try running freebsd on it and crank up the disk i/o
for a few days before i send it in for warranty repair/replacement.
--
...atom

________________________
http://atom.smasher.org/
762A 3B98 A3C3 96C9 C6B7 582A B88D 52E4 D9F5 7808
-------------------------------------------------

"According to the Bible, God created the heavens and the
Earth. It is man's prerogative - and woman's - to create
their own particular and private hell."
-- Rod Serling


_______________________________________________
NZLUG mailing list ***@linux.net.nz
http://www.linux.net.nz/cgi-bin/mailman/listinfo/nzlug
Simon Bridge
2011-04-02 04:05:34 UTC
Permalink
FWIW: I have been resisting the upgrade to 10.10 due to a range of
things suddenly not working. I'm holding the upgrades intil the next
release.
Post by Atom Smasher
Post by Cliff Pratt
I wonder why some people have problems like this. I've been using Ubuntu
continuously for several years and I've not hit anything like this.
==========
i've been helping others with ubuntu for years, and never seen anything
like this.
i was wondering if the problem was related to tweaking some things when i
installed from the alternate-install image: so i re-installed using the
idiot-resistant desktop image point-n-click install; upgraded from 10.04
to 10.10; and i'm still not getting an hour of uptime before it locks-up.
next i'm going to try installing ubuntu-studio (real-time kernel). i've
had good luck running 2.6.32-30-preempt (and earlier) on the same
hardware, so if that doesn't work out i'll start seriously suspecting the
hardware. i'll also try running freebsd on it and crank up the disk i/o
for a few days before i send it in for warranty repair/replacement.
_______________________________________________
NZLUG mailing list ***@linux.net.nz
http://www.linux.net.nz/cgi-bin/mailman/listinfo/nzlug
Bruce Kingsbury
2011-04-02 04:10:34 UTC
Permalink
I downloaded 11.04 today just for a look.. GAH! I think I'll be moving
everything to debian eventually!

--
Sent from my Ideos!
Post by Simon Bridge
FWIW: I have been resisting the upgrade to 10.10 due to a range of
things suddenly not working. I'm holding the upgrades intil the next
release.
Post by Atom Smasher
Post by Cliff Pratt
I wonder why some people have problems like this. I've been using Ubuntu
continuously for several years and I've not hit anything like this.
==========
i've been helping others with ubuntu for years, and never seen anything
like this.
i was wondering if the problem was related to tweaking some things when i
installed from the alternate-install image: so i re-installed using the
idiot-resistant desktop image point-n-click install; upgraded from 10.04
to 10.10; and i'm still not getting an hour of uptime before it locks-up.
next i'm going to try installing ubuntu-studio (real-time kernel). i've
had good luck running 2.6.32-30-preempt (and earlier) on the same
hardware, so if that doesn't work out i'll start seriously suspecting the
hardware. i'll also try running freebsd on it and crank up the disk i/o
for a few days before i send it in for warranty repair/replacement.
_______________________________________________
http://www.linux.net.nz/cgi-bin/mailman/listinfo/nzlug
_______________________________________________
NZLUG mailing list ***@linux.net.nz
http://www.linux.net.nz/cgi-bin/mailman/listinfo/nzlug
Robin Sheat
2011-04-02 04:12:13 UTC
Permalink
Post by Bruce Kingsbury
I downloaded 11.04 today just for a look.. GAH! I think I'll be moving
everything to debian eventually!
I'd expect there'll be an option to make it boot to a standard desktop.

Robin.
Bruce Kingsbury
2011-04-02 04:14:49 UTC
Permalink
There better be.. really dont like the defauit

--
Sent from my Ideos!
Post by Robin Sheat
Post by Bruce Kingsbury
I downloaded 11.04 today just for a look.. GAH! I think I'll be moving
everything to debian eventually!
I'd expect there'll be an option to make it boot to a standard desktop.
Robin.
_______________________________________________
NZLUG mailing list ***@linux.net.nz
http://www.linux.net.nz/cgi-bin/mailman/listinfo/nzlug
Nevyn
2011-04-02 09:50:01 UTC
Permalink
Post by Bruce Kingsbury
There better be.. really dont like the defauit
Same reason I won't move to 10.10. The context I'm needing is a
netbook interface and unity just has me gritting my teeth. Starting to
think I'm going to need to come up with a customized interface (500
netbooks at the moment. Scaling up to 2,000 next year).

Regards,
Nevyn
http://nevsramblings.blogspot.com/

_______________________________________________
NZLUG mailing list ***@linux.net.nz
http://www.linux.net.nz/cgi-bin/mailman/listinfo/nzlug
Atom Smasher
2011-04-05 03:55:03 UTC
Permalink
i installed freebsd-8.2 and had it running for ~65 hours without any
freezing. testing it was rsync-ing from LAN to HDD. if it worked with the
video card, i'd be happy to leave freebsd on it.

since freebsd is running without problems, and ubuntu keeps freezing
within an hour or two, i think it's not a hardware problem.

i was wondering if my "alternate install" of ubuntu may have botched
things, so i installed directly from the idiot-resistant
ubuntu-10.10-desktop-amd64.iso and then did all of my upgrades and
reboots.

i also reset the (latest) BIOS to defaults.

within an hour it froze without any messages.

i rebooted it with -22-generic instead of -28-generic, started up rsync in
a console, and within 90 minutes of uptime it froze. i woke up to find the
console filled with "BUG: soft lockup - CPU#XX stuck for YYs". load and
temperature where normal within ten seconds prior to freezing.

try again running rsync with 2.6.35-28-generic... and 31 minutes uptime...
locked up! no messages on console.

also, how do i get console messages via ssh? i tried "tail -f
/dev/console" and "tail -f /var/log/dmesg" but i'm not getting any
messages before it just drops the ssh connection.

i'm not sure where to go with this... as in, where should i file a bug
report? it seems as if the ubuntu team doesn't fix kernel bugs, and the
kernel team doesn't deal with bugs from a particular distro's kernel. i
must be missing something....
--
...atom

________________________
http://atom.smasher.org/
762A 3B98 A3C3 96C9 C6B7 582A B88D 52E4 D9F5 7808
-------------------------------------------------

"Slaughter of animals for food can exist
only in a barbaric society."
-- Akbarali Jetha


_______________________________________________
NZLUG mailing list ***@linux.net.nz
http://www.linux.net.nz/cgi-bin/mailman/listinfo/nzlug
Bruce Clement
2011-04-05 04:11:05 UTC
Permalink
[...]
since freebsd is running without problems, and ubuntu keeps freezing within
an hour or two, i think it's not a hardware problem.
i was wondering if my "alternate install" of ubuntu may have botched
things, so i installed directly from the idiot-resistant
ubuntu-10.10-desktop-amd64.iso and then did all of my upgrades and reboots.
It could still be a hardware error.
If FreeBSD doesn't talk to the video card it won't be sending stuff (the
technical term for data+control signals eludes me) to the control ports &
memory of the card so won't be triggering a crash.
i also reset the (latest) BIOS to defaults.
within an hour it froze without any messages.
Wild thought, does the video hardware share memory with the motherboard?
Have you allocated enough memory for your preferred display?

Failing which, did it come with a windows licence? If so, can you restore
windows on it and see if it still crashes?


Bruce
--
Bruce Clement

Home: http://www.clement.co.nz/
Twitter: http://twitter.com/Bruce_Clement
Directory: http://www.searchme.co.nz/

"Before attempting to create something new, it is vital to have a good
appreciation of everything that already exists in this field." Mikhail
Kalashnikov
_______________________________________________
NZLUG mailing list ***@linux.net.nz
http://www.linux.net.nz/cgi-bin/mailman/listinfo/nzlug
Nevyn
2011-04-05 04:21:01 UTC
Permalink
Post by Bruce Clement
[...]
since freebsd is running without problems, and ubuntu keeps freezing within
an hour or two, i think it's not a hardware problem.
i was wondering if my "alternate install" of ubuntu may have botched
things, so i installed directly from the idiot-resistant
ubuntu-10.10-desktop-amd64.iso and then did all of my upgrades and reboots.
It could still be a hardware error.
If FreeBSD doesn't talk to the video card it won't be sending stuff (the
technical term for data+control signals eludes me)  to the control ports &
memory of the card so won't be triggering a crash.
Did have something similar to this - the machine would reboot
everything 3d was attempted. Turned out the video card had over heated
and the capacitors had blown.

Took me a 'lil while to care enough to find out what the hell was going on.

Regards,
Nevyn
http://nevsramblings.blogspot.com/

_______________________________________________
NZLUG mailing list ***@linux.net.nz
http://www.linux.net.nz/cgi-bin/mailman/listinfo/nzlug
Nevyn
2011-04-05 04:22:04 UTC
Permalink
Post by Nevyn
Post by Bruce Clement
[...]
since freebsd is running without problems, and ubuntu keeps freezing within
an hour or two, i think it's not a hardware problem.
i was wondering if my "alternate install" of ubuntu may have botched
things, so i installed directly from the idiot-resistant
ubuntu-10.10-desktop-amd64.iso and then did all of my upgrades and reboots.
It could still be a hardware error.
If FreeBSD doesn't talk to the video card it won't be sending stuff (the
technical term for data+control signals eludes me)  to the control ports &
memory of the card so won't be triggering a crash.
everything 3d was attempted. Turned out the video card had over heated
Wow! I really must be tired. everything = "every time anything"
--
Regards,
Nevyn
http://nevsramblings.blogspot.com/

_______________________________________________
NZLUG mailing list ***@linux.net.nz
http://www.linux.net.nz/cgi-bin/mailman/listinfo/nzlug
Atom Smasher
2011-04-05 04:24:30 UTC
Permalink
Post by Bruce Clement
If FreeBSD doesn't talk to the video card it won't be sending stuff (the
technical term for data+control signals eludes me) to the control ports
& memory of the card so won't be triggering a crash.
==================

hhmmm... i didn't install xorg with the latest freebsd test... but with
ubuntu, it's running xorg even if i'm using the console. maybe i can
disable xorg in ubuntu and see what kind of uptime i get...
Post by Bruce Clement
Wild thought, does the video hardware share memory with the motherboard?
Have you allocated enough memory for your preferred display?
=============

i ~think~ the video card uses the main memory. i'll have to check if
there's a BIOS setting for it. i'm not sure if that's a factor if it runs
for 90 minutes on the console and then freezes.
Post by Bruce Clement
Failing which, did it come with a windows licence? If so, can you
restore windows on it and see if it still crashes?
================

yeah, it came with the windoze tax. i ~think~ i made a backup disk image
before wiping it... i guess as a last resort i can see if it crashes with
windoze, and then it'd certainly be a warranty return.
--
...atom

________________________
http://atom.smasher.org/
762A 3B98 A3C3 96C9 C6B7 582A B88D 52E4 D9F5 7808
-------------------------------------------------

"I do not feel obliged to believe that the same God
who has endowed us with sense, reason, and intellect
has intended us to forgo their use."
-- Galileo Galilei


_______________________________________________
NZLUG mailing list ***@linux.net.nz
http://www.linux.net.nz/cgi-bin/mailman/listinfo/nzlug
Atom Smasher
2011-04-05 07:10:16 UTC
Permalink
Post by Atom Smasher
hhmmm... i didn't install xorg with the latest freebsd test... but with
ubuntu, it's running xorg even if i'm using the console. maybe i can
disable xorg in ubuntu and see what kind of uptime i get...
================

ubuntu, no xorg, and it barely got past two hours before it locked up. no
messages on the console, but i did get this over ssh from "tail -f
/var/log/syslog":

Apr 5 18:49:58 oreo kernel: [ 7612.941470] ------------[ cut here ]------------

yeah, that's helpful :|

here's what's running, before starting rsync/ssh...

init-+-NetworkManager-+-dhclient
| `-{NetworkManager}
|-acpid
|-atd
|-avahi-daemon---avahi-daemon
|-console-kit-dae---63*[{console-kit-da}]
|-cron
|-cupsd
|-dbus-daemon
|-5*[getty]
|-irqbalance
|-login---zsh---pstree
|-login
|-master-+-pickup
| `-qmgr
|-modem-manager
|-ondemand---sleep
|-rsyslogd---3*[{rsyslogd}]
|-sshd
|-udevd---2*[udevd]
|-upstart-udev-br
`-wpa_supplicant
Post by Atom Smasher
Post by Bruce Clement
Failing which, did it come with a windows licence? If so, can you
restore windows on it and see if it still crashes?
================
yeah, it came with the windoze tax. i ~think~ i made a backup disk image
before wiping it... i guess as a last resort i can see if it crashes
with windoze, and then it'd certainly be a warranty return.
===============

uugghhhh.... any other ideas...?
--
...atom

________________________
http://atom.smasher.org/
762A 3B98 A3C3 96C9 C6B7 582A B88D 52E4 D9F5 7808
-------------------------------------------------

"The state must declare the child to be the most precious treasure
of the people. As long as the government is perceived as working
for the benefit of the children, the people will happily endure
almost any curtailment of liberty and almost any deprivation."
-- Adolf Hitler, Mein Kampf


_______________________________________________
NZLUG mailing list ***@linux.net.nz
http://www.linux.net.nz/cgi-bin/mailman/listinfo/nzlug
Atom Smasher
2011-04-05 11:26:37 UTC
Permalink
this is getting annoying...

testing again with ubuntu-10.10 without xorg. i'm running srm (against the
files that were previously rsync'd) to keep the disk i/o humming, and a
few ssh sessions to keep an eye on the box.

uptime of almost four hours. again, i'm getting this super-helpful message
in syslog when it locks-up:

Apr 5 22:34:01 oreo kernel: [13220.689159] ------------[ cut here ]------------

great :|

from the console it's sort-of responding to alt-sysreq-REISUB...

i'm getting visual feedback:
... SysRq : Keyboard mode set to system default
... SysRq : Terminate All Tasks
... SysRq : Emergency Sync
... SysRq : Emergency Remount R/O

but... the disk light isn't blinking when it says it's syncing, and
nothing happens when i get to "B".

right now it's running a BIOS HDD diagnostic app. if that finishes before
i go to sleep i'll start memtest. but it seems happy enough with freebsd,
so i'm still suspicious that this is an edge case with linux.
--
...atom

________________________
http://atom.smasher.org/
762A 3B98 A3C3 96C9 C6B7 582A B88D 52E4 D9F5 7808
-------------------------------------------------

"They do not bear arms, and do not know them, for
I showed them a sword, they took it by the edge
and cut themselves out of ignorance. They have
no iron. Their spears are made of cane...
They would make fine servants...
With fifty men we could subjugate them all and
make them do whatever we want."
-- Christopher Columbus,
after "Discovering America"


_______________________________________________
NZLUG mailing list ***@linux.net.nz
http://www.linux.net.nz/cgi-bin/mailman/listinfo/nzlug
Bruce Clement
2011-04-05 12:07:32 UTC
Permalink
Post by Atom Smasher
this is getting annoying...
Yes, I'm feeling frustrated by proxy & I'm not the poor sod living through
this problem.

I've been Googling for T510 Linux problems and the following 3 pages talk
about incompatibility between Linux and the T510

http://www.thinkwiki.org/wiki/Category:T510
http://forums.lenovo.com/t5/T400-T500-and-newer-T-series/T510-amp-Full-HD-Display-with-Linux/td-p/328613
http://www.linlap.com/wiki/lenovo+thinkpad+t510

Summarising: If you have a 6250 wireless controller expect problems. It
works best if you use the proprietary NVIDIA driver & "If using the
proprietary Nvidia driver (NVIDIA Driver Version 260.19.06) one has to set
"Discrete" in BIOS. Linux seems to have some problems with Switchable
Graphics (Optimus)"

Not that any of this should be relevant with X switched off, but you never
know.

More silly questions:

Are you comparing apples with apples: 64 bit FreeBSD vs 64 bit Linux // 32
bit FreeBSD vs 32 bit Linux?

Are you using native network drivers or project evil / NDISWrappers? If so,
is this info (From http://en.wikipedia.org/wiki/NDISwrapper) relevant
"NDISwrapper does not implement
NDIS<http://en.wikipedia.org/wiki/Network_Driver_Interface_Specification>6
(Windows Vista version) yet, limiting drivers to Windows XP
[1] <http://en.wikipedia.org/wiki/NDISwrapper#cite_note-0>. While it is not
a major problem for the x86 architecture because of the popularity of
Windows XP x86-32, many vendors choose to make 64-bit driver versions only
for Windows Vista — which means that Linux systems using the x86-64
architecture are unable to use such networking devices (either NDIS5 32 bits
because they are 64bits systems or NDIS6 64bit drivers because they can't
use NDIS6). It's possible to use Windows XP 64 bit drivers which implement
NDIS5[ndiswrapper
forum]<http://sourceforge.net/projects/ndiswrapper/forums/forum/323168/topic/3755985>,
however, there are fewer available drivers for xp64 (NDIS5/64 bit) than for
XP32(NDIS5/32 bit)."

Bruce
--
Bruce Clement

Home: http://www.clement.co.nz/
Twitter: http://twitter.com/Bruce_Clement
Directory: http://www.searchme.co.nz/

"Before attempting to create something new, it is vital to have a good
appreciation of everything that already exists in this field." Mikhail
Kalashnikov
_______________________________________________
NZLUG mailing list ***@linux.net.nz
http://www.linux.net.nz/cgi-bin/mailman/listinfo/nzlug
Atom Smasher
2011-04-06 00:12:01 UTC
Permalink
Post by Bruce Clement
I've been Googling for T510 Linux problems and the following 3 pages
talk about incompatibility between Linux and the T510
http://www.thinkwiki.org/wiki/Category:T510
http://forums.lenovo.com/t5/T400-T500-and-newer-T-series/T510-amp-Full-HD-Display-with-Linux/td-p/328613
http://www.linlap.com/wiki/lenovo+thinkpad+t510
===============

of course there's also plenty of reports about T510s having problems with
windows, including freezing.

on my other T510 everything is smooth, except for freezing on rare
occasion, over the last year. that's currently running
ubuntu-studio-10.04.2 with 2.6.32-30-preempt. maybe i should swap drives
and see what kind of uptime i can get on the troubled machine...
Post by Bruce Clement
Summarising: If you have a 6250 wireless controller expect problems. It
works best if you use the proprietary NVIDIA driver & "If using the
proprietary Nvidia driver (NVIDIA Driver Version 260.19.06) one has to
set "Discrete" in BIOS. Linux seems to have some problems with
Switchable Graphics (Optimus)"
Not that any of this should be relevant with X switched off, but you
never know.
===========

intel graphics and 6300 wireless.
Post by Bruce Clement
Are you comparing apples with apples: 64 bit FreeBSD vs 64 bit Linux //
32 bit FreeBSD vs 32 bit Linux?
============

64 bit all around.
Post by Bruce Clement
Are you using native network drivers or project evil / NDISWrappers?
==========

non-evil, and networking seems to be working alright... there *may* be a
correlation between network traffic and freezing, since freezes seem to
happen quicker when i'm pulling traffic into the box, but network stuff is
ostensibly working.
--
...atom

________________________
http://atom.smasher.org/
762A 3B98 A3C3 96C9 C6B7 582A B88D 52E4 D9F5 7808
-------------------------------------------------

"Not a single war has been fought by vegetarians."
-- Akbarali Jetha


_______________________________________________
NZLUG mailing list ***@linux.net.nz
http://www.linux.net.nz/cgi-bin/mailman/listinfo/nzlug
Atom Smasher
2011-04-11 17:26:46 UTC
Permalink
i tried ubuntu-11.04 and it sometimes ran for 2-24 hours before freezing.

the "unity" desktop in 11.04 is hellish. i've turned on several "i'm not a
computer person" people to ubuntu, and they all like it better than
windoze, but unity will crap all over that. the beta CD that i tried did
have an "ubuntu classic" selection in GDM. what were they thinking???

anyway... lenovo tech support directed me to a PC-Doctor bootable
diagnostic CD that's specific to the T510 (and related) hardware. that's
showing some memory faults that memtest didn't find.

with any luck i'll get the RAM swapped under warranty without too much
hassle, and that'll fix the (latest) problems i've been having with this
machine. then i'll install ubuntu-10.10 on it, and probably run freebsd
via qemu.
--
...atom

________________________
http://atom.smasher.org/
762A 3B98 A3C3 96C9 C6B7 582A B88D 52E4 D9F5 7808
-------------------------------------------------

"You have just dined, and however scrupulously
the slaughterhouse is concealed in the graceful
distance of miles, there is complicity."
-- Ralph Waldo Emerson, 1870


_______________________________________________
NZLUG mailing list ***@linux.net.nz
http://www.linux.net.nz/cgi-bin/mailman/listinfo/nzlug
Atom Smasher
2011-04-15 02:30:46 UTC
Permalink
it's definitely looking like hardware.

memtest showed 4 errors over 28 hours, all in the last 100M of space. two
of the failures were at the same address that was flagged by pc-doctor.

i'm still waiting for the replacement RAM (which is not in NZ), but i've
got some temporary RAM in there for now. it's running ubuntu-10.04.2,
2.6.32-30-generic. uptime so far is 17+ hours (a new record on this
machine!) and it's been running rsync/ssh & glmatrix.

did someone suggest this...? i suspect that freebsd must have not been
using the area of memory that's bad, but apparently linux was finding it
and consistently freezing within an hour or two.

moral of the story: when in doubt, let memtest run for at least 24 hours.
just because it runs overnight for 10+ passes and zero errors doesn't mean
everything is fine.
--
...atom

________________________
http://atom.smasher.org/
762A 3B98 A3C3 96C9 C6B7 582A B88D 52E4 D9F5 7808
-------------------------------------------------

"Your password must be at least 18770 characters and
cannot repeat any of your previous 30689 passwords.
Please type a different password. Type a password
that meets these requirements in both text boxes."
-- Microsoft takes security seriously in
Knowledge Base Article Q276304.


_______________________________________________
NZLUG mailing list ***@linux.net.nz
http://www.linux.net.nz/cgi-bin/mailman/listinfo/nzlug
Bruce Kingsbury
2011-04-15 02:33:18 UTC
Permalink
http://rick.vanrein.org/linux/badram/

Might get you by until you can get some replacement ram?
Post by Atom Smasher
it's definitely looking like hardware.
memtest showed 4 errors over 28 hours, all in the last 100M of space. two of
the failures were at the same address that was flagged by pc-doctor.
i'm still waiting for the replacement RAM (which is not in NZ), but i've got
some temporary RAM in there for now. it's running ubuntu-10.04.2,
2.6.32-30-generic. uptime so far is 17+ hours (a new record on this
machine!) and it's been running rsync/ssh & glmatrix.
did someone suggest this...? i suspect that freebsd must have not been using
the area of memory that's bad, but apparently linux was finding it and
consistently freezing within an hour or two.
moral of the story: when in doubt, let memtest run for at least 24 hours.
just because it runs overnight for 10+ passes and zero errors doesn't mean
everything is fine.
--
       ...atom
 ________________________
 http://atom.smasher.org/
 762A 3B98 A3C3 96C9 C6B7 582A B88D 52E4 D9F5 7808
 -------------------------------------------------
       "Your password must be at least 18770 characters and
        cannot repeat any of your previous 30689 passwords.
        Please type a different password. Type a password
        that meets these requirements in both text boxes."
               -- Microsoft takes security seriously in
               Knowledge Base Article Q276304.
_______________________________________________
http://www.linux.net.nz/cgi-bin/mailman/listinfo/nzlug
_______________________________________________
NZLUG mailing list ***@linux.net.nz
http://www.linux.net.nz/cgi-bin/mailman/listinfo/nzlug
Bruce Clement
2011-04-15 02:54:35 UTC
Permalink
Post by Atom Smasher
[...]
did someone suggest this...? i suspect that freebsd must have not been
using the area of memory that's bad, but apparently linux was finding it and
consistently freezing within an hour or two.
Or Freebsd loaded something into that memory that wasn't accessed in your
use of that machine while what Linux loaded in there was used and possibly
even dynamic. Or even, if the fault is one bit "sometimes" having the wrong
value, but always going in the same direction, then there's a 50% chance
that $OperatingSystem will have that value in there anyway.
Post by Atom Smasher
moral of the story: when in doubt, let memtest run for at least 24 hours.
just because it runs overnight for 10+ passes and zero errors doesn't mean
everything is fine.
Yep, last time I used memtest to seriously diagnose an intermittent problem
I fired it up 5PM Friday and looked at the output 9AM Monday.


Glad you've found the problem.
--
Bruce Clement

Home: http://www.clement.co.nz/
Twitter: http://twitter.com/Bruce_Clement
Directory: http://www.searchme.co.nz/

"Before attempting to create something new, it is vital to have a good
appreciation of everything that already exists in this field." Mikhail
Kalashnikov
_______________________________________________
NZLUG mailing list ***@linux.net.nz
http://www.linux.net.nz/cgi-bin/mailman/listinfo/nzlug
Robin Sheat
2011-04-15 13:22:04 UTC
Permalink
Post by Bruce Clement
Or Freebsd loaded something into that memory that wasn't accessed in your
use of that machine while what Linux loaded in there was used and possibly
even dynamic. Or even, if the fault is one bit "sometimes" having the wrong
value, but always going in the same direction, then there's a 50% chance
that $OperatingSystem will have that value in there anyway.
Yep, I've had a fault that caused a certain program to not compile. The error
only showed up on one particular test of memtest (although, fairly
consistently.) Removing that RAM made everything better again.

Glad to hear it wasn't a Linux problem as much as a "way Linux is using your
hardware to its max" problem :)

Robin.

Atom Smasher
2011-04-05 04:49:20 UTC
Permalink
i'm not seeing anything in BIOS to adjust video RAM.

it's running now without xorg. we'll see how this goes...
--
...atom

________________________
http://atom.smasher.org/
762A 3B98 A3C3 96C9 C6B7 582A B88D 52E4 D9F5 7808
-------------------------------------------------

"They have computers, and they may have
other weapons of mass destruction."
-- Janet Reno, US Attorney General,
27 Feb 1998


_______________________________________________
NZLUG mailing list ***@linux.net.nz
http://www.linux.net.nz/cgi-bin/mailman/listinfo/nzlug
Shiv Manas
2011-04-05 11:47:19 UTC
Permalink
Post by Atom Smasher
i'm not seeing anything in BIOS to adjust video RAM.
it's running now without xorg. we'll see how this goes...
--
...atom
________________________
http://atom.smasher.org/
762A 3B98 A3C3 96C9 C6B7 582A B88D 52E4 D9F5 7808
-------------------------------------------------
"They have computers, and they may have
other weapons of mass destruction."
-- Janet Reno, US Attorney General,
27 Feb 1998
_______________________________________________
http://www.linux.net.nz/cgi-bin/mailman/listinfo/nzlug
Have you tried 2.6.38.x yet? If there's any kernel you might want to try,
this one would be it. Massive improvements overall; possibly the most
significant kernel update in years.
_______________________________________________
NZLUG mailing list ***@linux.net.nz
http://www.linux.net.nz/cgi-bin/mailman/listinfo/nzlug
Atom Smasher
2011-04-05 23:55:42 UTC
Permalink
Post by Shiv Manas
Have you tried 2.6.38.x yet? If there's any kernel you might want to
try, this one would be it. Massive improvements overall; possibly the
most significant kernel update in years.
=====================

worth a shot. i'll try to get ubuntu-11.04-beta1-desktop-amd64.iso
downloaded tonight.
--
...atom

________________________
http://atom.smasher.org/
762A 3B98 A3C3 96C9 C6B7 582A B88D 52E4 D9F5 7808
-------------------------------------------------

"There can be no greater good than the quest for peace,
and no finer purpose than the preservation of freedom."
-- U.S. President Ronald Reagan


_______________________________________________
NZLUG mailing list ***@linux.net.nz
http://www.linux.net.nz/cgi-bin/mailman/listinfo/nzlug
Loading...