Showing posts with label PXE. Show all posts
Showing posts with label PXE. Show all posts

Monday, 7 March 2011

Troubleshooting PXE in SCCM OSD Part 3

Troubleshooting PXE in SCCM OSD Part 1
Troubleshooting PXE in SCCM OSD Part 2 Troubleshooting the TFTP Service
Now that the PXE process is working correctly, we can look at troubleshooting errors surrounding abortpxe.com. If you get this error message then you at least have a working PXE environment, even if SCCM doesn't think it should offer a Task Sequence to the machine. Here are some reasons you'll get this error.

  • The machine has not been registered in a build collection

    The simplest of all reasons why you get this error. Does the machine have a Task Sequence advertised to it? If not, create a collection, advertise a Task Sequence to that collection and add your machine to the collection. Check smspxe.log, you should see an error such as

    ProcessDatabaseReply: No Advertisement found in Db for device 05/03/2011 08:51:36 10368 (0x2880)

  • The machine has been recently registered in a build collection, but the server takes some time (up to an hour) to process this information

    This can be commonly seen when a technician PXE boots the machine to write down the MAC address. If you then create a new computer object based on the MAC address, you need to wait an hour before the WDS service will lookup the database again. You can see this happening in the smspxe.log with an entry such as

    MAC=FF:FF:FF:FF:FF:FF:FF:FF:FF:FF:FF:FF:FF:FF:FF:FF SMBIOS GUID=00000000-0000-0000-0000-000000000000 > Device not found in the database. 07/03/2011 15:18:46 8552 (0x2168)

    This is fixed by this hotfix or SP2 for SCCM. The patch alone won't fix this behavior, you also need to configure a registry setting. If it doesn't already exist, create a REG_DWORD value at
    HKLM\SOFTWARE\Microsoft\SMS\PXE\CacheExpire

    ...or on a 64-bit server at...

    HKLM\SOFTWARE\Wow6432Node\Microsoft\SMS\PXE\CacheExpire

    Set the value of CacheExpire to the value you want in seconds - a value of 600 would be a timeout of 10 minutes. On a SP2 SCCM site, setting the value to be 0 will actually set the timeout to 3600 seconds (back to the 1 hour timeout).

    If you are unable to apply the hotfix or SP2, stopping and restarting the WDS service can flush out the cache.

  • The SMBIOS guid of the machine is not unique

    This can be seen if you have older hardware, or if you've had an engineer swap out some motherboards and not flashed the BIOS correctly.

    If you want to find out which machines have duplicate SMBIOS guids then you can run this report-

    SELECT SMBIOS_GUID0, COUNT(SMBIOS_GUID0) AS Count
    FROM v_R_System
    GROUP BY SMBIOS_GUID0, Active0, Client0, Obsolete0
    HAVING (Active0 = 1) AND (Client0 = 1) AND (Obsolete0 = 0) AND (COUNT(SMBIOS_GUID0) > 1)

    You can then use the following report to pull out the names of the machines with duplicate SMBIOS guids-

    SELECT SMBIOS_GUID0, Name0
    FROM v_R_System
    WHERE SMBIOS_GUID0= '00000000-0000-0000-0000-000000000000'

    -where 00000000-0000-0000-0000-000000000000 is the GUID that you identified in the previous report.

    The only way to solve this problem is to flash the BIOS on the affected workstation to set a unique SMBIOS guid. Contact the PC vendor for the tool to do this.

  • The machine is linked to an obsolete object on the server

    This can happen if you have "Automatically create new client records for duplicate hardware IDs" set in the Advanced tab of your site properties. The solution to this one is to manually delete those obsolete objects.

  • The machine was imaged using a technology such as Ghost, but the SID and/or SCCM client guid were not reset

    This can be a bit of a pain to troubleshoot on the server - I once saw a machine that according to the SCCM reports had 30 separate users logging into it. Since this machine was kept in a locked office, this appeared to be a bit odd. It turned out the support team had used Ghost to image one of their machines and then deployed this image to all the machines in their department.

    This highlights a wider point in deploying SCCM in your environment - the process and procedures that worked in the past may need revising. In this case, they'd never had a problem before because their authentication was handled by Netware.

    The easiest way to fix this problem is to power off the machine, delete the computer object in SCCM, recreate the record manually then PXE build the machine.

Other pre-Windows PE errors

  • \Boot\BCD error

    Assuming you can get past abortpxe.com, there is another error you can see at this stage. After pressing the F12 key to PXE boot you can sometimes see



    Windows Boot Manager (Server IP: x.y.z.a)

    Windows failed to start. A recent hardware or software change might be the cause.

    File: \Boot\BCD
    Status: 0xc000000f
    Info: An error occurred while attempting to read the boot configuration data.


    The simple solution is to delete the computer object and recreate it, which should fix this problem. I've only ever seen this problem with SCCM 2007 SP2 when deploying Windows 7.

    This does look like a bcd error, but in the SCCM implementation of WDS there is no single boot.bcd file, the boot.bcd file is created on the fly in the RemoteInstall\SMSTemp folder with a name of year.month.day.hour.minute.number.number.guid.boot.bcd.

    If anyone knows the actual fix for this (without having to delete the computer object) please post in the comments!

  • Only using 32-bit boot images when you have 64-bit machines in your environment

    Again, this one seems a bit odd. If your workstation is 64-bit (and you'd be hard pressed to find a non-64-bit machine these days), then you need the 64-bit boot files available - even if you are only deploying 32-bit Windows, and are using a 32-bit boot image. The 64-bit boot files are extracted from the boot image and used during the initial PXE process, so if they're missing, you won't be able to PXE boot a 64-bit machine.

    If you're getting this error, you'll see something like this in smspxe.log

    The SMS PXE Service Point does not have a boot image matching the processor architetcure of the PXE booting device.

Troubleshooting PXE in SCCM OSD Part 2

Troubleshooting PXE in SCCM OSD Part 1
Troubleshooting PXE in SCCM OSD Part 3 Troubleshooting the TFTP Service
PXE-E32: TFTP Open Timeout

Assuming your client gets an IP address, there is still a large number of ways for it to fail before you even get an abortpxe.com message. PXE-E32 TFTP open timeout can be a frustrating message - but it does at least give you a clue where to look.

This error means that your client machine can't access the TFTP daemon running on your PXE Service point. Assuming your PXE Service Point is set up correctly (check the WDS service is running), the most common reason for this message is network filters/firewall settings. Fortunately, Microsoft provide a document which lists what ports need to be open for the TFTP daemon to work. Read this document carefully, you need to open more than just ports 69 and 4011 to get this to work. The daemon listens on port 69 but responds on a randomly chosen high port. You'll need to configure the network filter rules to allow this behavior before TFTP will work.

You might also see this error if DHCP is misconfigured. If you have DHCP and the PXE Service point on different servers then you'll need to set option 66, the Boot Server Host Name. A small tip here - use the IP address of the PXE Service Point when troubleshooting this setting - this removes the possibility that it's a DNS resolution issue. You can always set it back once you're happy everything is back working.

PXE-E53: No boot filename received

Check option 67 on the DHCP server. It should be something like

smsboot\x86\wdsnbp.com

PXE-E55: ProxyDHCP service did not reply to request on port 4011

Related to the TFTP timeout problem, this suggests a firewall or routing issue. Check the firewall settings allow 4011 UDP through.

If the client and the PXE Service Point are on different subnets, check that the traffic is being forwarded from the client subnet to the PXE Service Point.

PXE-E3B: TFTP error file not found

At this point we know the client is getting service from DHCP and has managed to find the TFTP server and request the boot file. Two things to check here are

  1. Option 67 is configured correctly and pointing to a file that exists on the server
  2. The files are actually on the TFTP server

Check the SMSBoot folder in the reminst share on the PXE Service Point. There should be 3 folders in the SMSBoot folder - ia64, x64 and x86. Each folder should contain some boot files. If not, you have problems!

The missing boot files can be fixed in a number of ways. The easiest way is to just copy the correct files over from a working PXE Service Point. I would not recommend this though - the files are missing for a reason, and you should really fix the underlying cause.

This error can be caused by a number of things- updating drivers in the default OSD Boot Images, restarting the server hosting the PXE Service Point or just a botched PXE Service Point install. The first thing you should try is clearing out temp files used by PXE.

  • Stop the WDS Service
  • Delete (or move) the folder %temp%\PXEBootFiles
  • Start the WDS Service

If this doesn't work it might be a more fundamental problem with the PXE Service Point. Remove the role from the server, restart the server hosting the PXE Service Point and Add the role back.

Friday, 18 February 2011

Troubleshooting PXE in SCCM OSD Part 1

PXE booting makes deploying OS images much simpler for end user technicians. There is a lot that can go wrong though, especially if you're attempting to run it in a high security, heavily filtered network.

In the next few blog posts I'll cover how to go about troubleshooting PXE errors in OSD.

When a PXE failure occurs it helps to be very precise with the step it failed at. The place at which a PXE build fails can tell us where to investigate.

Some possible causes of error in a PXE build are-

  • Workstation BIOS configuration and/or lack of RAM
  • Duplicate SMBIOS id (typically seen on older hardware)
  • DHCP Server configuration
  • Network filters / configuration
  • WDS service failure
  • PXE service point failure
  • Wrong collection membership in SCCM
  • WDS cached collection membership
  • Obsolete objects in SCCM
  • Network drivers for Vista/7 are available, but not for XP
  • Network drivers are not available for Vista/7

On the server side there's one log file that will help you immensely. If you have set up the PXE Service Point on the site server it can be found at

%ProgramFiles%\SMS_CCM\Logs\smspxe.log

Or, if you have configured another server as the PXE Service Point it will be found at

SMS_CCM\Logs\smspxe.log

in the root of the drive SCCM is using.

Using Trace32 to view this log file can give you realtime information on the PXE boot process. However, the first error we'll look at won't even show up in this log.

PXE-E51: No DHCP or proxyDHCP offers were received

The most common PXE error I see is PXE-E51. The first indication that something is wrong is when you see DHCP... and you get more than three or four dots.



The PXE process fails at this point with PXE-E51: No DHCP or proxyDHCP offers were received.



This error basically says that the machine can't obtain an IP address. Possible reasons for this include

  1. Your DHCP server isn't working
  2. If you use DHCP reservations you may have made a mistake entering the MAC address of this machine
  3. You don't have a DHCP pool set up for this subnet, or the pool has no free addresses
  4. Your DHCP server is on a different subnet and you haven't set up an IP forwader or DHCP Relay agent
  5. The network cable or port is broken

Most of these problems are easy to check, or are easy for your networking people to check. Once the problem is fixed the PXE boot process works properly in most cases. Assuming your network is configured to allow PXE booting this error normally means one of two things - the cable is faulty or there's no DHCP reservation/the DHCP pool is exhausted.

This error highlights the need for preciseness in the error reports from your technicians. Since there's so much more that can go wrong at this stage, it's nice to have an error which is relatively easy to fix.

Troubleshooting PXE in SCCM OSD Part 2
Troubleshooting PXE in SCCM OSD Part 3
Troubleshooting the TFTP Service