Context Navigation

James is working on a second generation inventory script.

Currently on: Rest Integration for new database type and Cpu Bench marks

Versions:

gather.rb:3.9
rest_db.rb:1.3
log_wrap.rb:0.1

It's plan is to be simpler and less ambitious than it's predecessor. Instead of the old sql, we'll now use the new rest client.

There are 3 parts to this process:

gatherer.rb: collects information using operating system based facilities and tools(dmesg, lsusb, lspci, ifconfig, /sys, lshw, sysbench).
rest_db.rb: a generic rest-client wrapper library for interfacing the rest based database with and it's flat data structure
log_wrap.rb: a unified log class that can be share across many distributed ruby files.

The Rest Data base structure is a flat resource / attribute mode. Nodes and their "attached" devices are consiered resouces. The devices attached to a node are children resouces of the node. Both nodes and their devices will have meaning full attributes. Attributes come in 3 heading types INV, CM, INF. Those are described here. gathering is only concerned with INV type attributed: This is what is colleted and how they are related

Nodes
1. INV_hd_sn
2. INV_mb_sn
3. INV_cpu_bench
4. INV_memory
5. INV_cpu_hz
6. INV_cpu_type
7. INV_hd_size
8. INV_check_in
Devices
1. INV_dev_id
2. INV_dev_type
3. INV_if_name (optional)
4. INV_if_mac (optional)

Require Tools / Libraries

Ubuntu Standard

date
hostname
lsusb
lspci
dmesg
ifconfig
lshw

apt-pacakge

librestclient-ruby
sysbench
smartmontools

ruby standard

logger
ftools (ruby standard)

Other Packages

UHD
Netfpga

Gatherer:

Gatherer is now the only executable. It preforms both gathering and updating (via the rest client). It's collects information mostly via regular expressions on the output of the various tools and uses them to identify the node (by it's fqdn), and populate the data paramters.

NOTES

9/29/09

I may have discovered the cause of the device / vendor discrepancy. Joe seems to be looking at /sys/class/net/devincename/device… perhaps this points to a different device id. I'll have to check it out.

That being said I have a working Gahterer protoype:

ssugrim@external2:~/scripts$ ruby gatherer.rb
ssugrim@external2:~/scripts$ more /tmp/external2.xml
<external2>
 <ip_adds>
  <10.50.0.12 iface='eth1' host='external2.orbit-lab.org'/>
  <127.0.0.1 iface='' host=''/>
 </ip_adds>
 <motherboard mem_size='1048512' disk_size='156301488' cpu_num='4'/>
 <Devices>
  <pci>
   <eth0 device='1229' bus_add='01:03.0' mac='00:e0:81:26:70:16' str='Ethernet controller: Intel Corporation 82546EB Gigabit Ethernet Controller (Copper) (rev 01)' vendor='8086'/>
   <eth1 device='1010' bus_add='04:01.0' mac='00:e0:81:26:76:9c' str='Ethernet controller: Intel Corporation 82546EB Gigabit Ethernet Controller (Copper) (rev 01)' vendor='8086'/>
   <eth2 device='1010' bus_add='04:01.1' mac='00:e0:81:26:76:9d' str='Ethernet controller: Intel Corporation 82546EB Gigabit Ethernet Controller (Copper) (rev 01)' vendor='8086'/>
  </pci>
  <usb>
   <0 device='0001' bus_add='001:001' str='Linux Foundation 1.1 root hub' vendor='1d6b'/>
  </usb>
 </Devices>
</external2>
ssugrim@external2:~/scripts$

10/2/09

Minus error checking for failed commands, the gatherer is complete. I'm now moving onto writer. I'm going to keep them in the same script for now, so I don't have to deal with reimporting the data and extracting it from xml, at some point that'll be a todo, so that way we can call just the gatherer if we want to.

Fow now, I need to determine what node I am based on the resolved host name. The scheme is nodex-y.testbedname# I can extract the x and y cooridnates from the node part, and then The testbed name will have to be a lookup. (this should probably be in gatherer as parameters.

Once I have that I can look up my unique mysql id from the mysql databse. This id will then allow me to correlate devices with the ones I have.

Following the instructions on http://support.tigertech.net/mysql-duplicate

I copied the mysql database from invetory1 to inventory2.

One Caveat is noted on http://forums.digitalpoint.com/showthread.php?t=259486

In the top of the database file you are trying to dump you will see that :
CREATE DATABASE `gunit_pimpjojo` DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci;
Just remove this from the dump ( Notepad or wherever you have the dump)
Then re paste the file
You just need to remove that line....and you will be good to go

10/5/09

I've revamped the internal data types to facilitate the way xml-simple outputs, and re imports. Any multi argument results (eth, usb, ip) return and array of hashes. This creates clean xml. I also unfolded the cords has to single instance variables, they all get wrapped up into a single attribute.

The new xml format looks like so.

<opt x="1" y="1" disk_size="156301488" domain="sb7" cpu_num="1" mem_size="491456">
  <pci device="0013" name="ath0" bus_add="00:09.0" mac="00:60:b3:ac:2b:92" str="Ethernet controller: Atheros Communications, Inc. AR5212/AR5213 Multiprotocol MAC/baseband processor (rev 01)" vendor ="168c" />
  <pci device="0013" name="ath1" bus_add="00:0a.0" mac="00:60:b3:ac:2b:66" str="Ethernet controller: Atheros Communications, Inc. AR5212/AR5213 Multiprotocol MAC/baseband processor (rev 01)" vendor="168c" />
  <pci device="4320" name="eth0" bus_add="00:0b.0" mac="00:0f:ea:4a:8b:56" str="Ethernet controller: Marvell Technology Group Ltd. 88E8001 Gigabit Ethernet Controller (rev 13)" vendor="11ab" />
  <pci device="4320" name="eth1" bus_add="00:0c.0" mac="00:0f:ea:4a:8b:57" str="Ethernet controller: Marvell Technology Group Ltd. 88E8001 Gigabit Ethernet Controller (rev 13)" vendor="11ab" />
  <usb device="6001" name="usb" bus_add="001:009" str="Future Technology Devices International, Ltd FT232 USB-Serial (UART) IC" vendor="0403" />
  <ip ip="10.17.1.1" host="node1-1.sb7.orbit-lab.org" iface="eth1" />
  <ip ip="127.0.0.1" host="" iface="" />
</opt>

I've also gone to the original two script model. Gatherer is "feature complete".

Working on the writer I've created a internal data type called Xmldata, it's got exactly the same fields as Info, but populates them from the generated xml file.

Working on the mysql part of I have to examine the code that lives in

ssugrim@internal1:/opt/gridservices2-2.2.0/lib/ogs_inventory$

NOTE: mysql query strings should be crafted prior to the actual execusiton of the query, since they don't always form the way you think they do. Also the %&string& formulation of strings is very helpfull in getting the quotes correct.

10/08/09

The writer script is now equipped with two classes Xmldata, and Identify. Both can only be instantiated by the create command, making them singletons (create will only run new if one does not already exist). Identify instantiates an Xmldata class, and then uses the x and y coordinates and the domain to determine a location id (the global unique identifier that ties all the tables together.) I also get the max id from the Inventory ids, assuming that the highest number is the latest.

10/12/09

Quick edit to gatherer to convert the device and vendor tags to decimal instead of hex. The reason they didn't match before was because in the sql database, they are stored as decimal (I guess cuz you can't store hex in mysql).

10/18/09

Writer is "feature complete". The mail (non-data) class is Check_sql. Besides new, it's main methods are check and update. They respectively compare the xmldata against sql and update the database if the data doesn't match. I'd like to be more "indpendent" of the form of the xmldata, but that would involve a lot more dummy varibles and searching of hashes.

Big TODO is mostly rescuing errors. First on the list is connect retries. Class interface descriptions to follow soon.

10/20/09

Modified both gatherer and writer to take parameters. The paramters are as follows:

Writer:
 --server = #server hostname (default: internal1.orbit-lab.org)
 --user = #username for mysql
 --pass = #password
 --db = #database name (default: inventory2)
 --input = #input file name (default: foo.xml)

Gatherer: 
 --output = #name of outputfile (defualt: stdout)

Also now writer only checks vendor and device id. If no match is found it will add it with the description string.

10/26/09

Modifying gather to use lshw to collect uuid (motherboard serial number) also changing the internal data types to more closely match the table contents e.g devices and motherboards.

Originally I thought to just used lshw to gather most of the information, but this doesn't really gain us any thing since I would have to resort to the other tools (lsusb and lspci) to find the relevant entries in lshw output (and would require a huge rewrite of the gatherer). Lshw can output in a few diffrent ways. I'm currently using the direct line by line approach to search for the uuid. I did however experiment with the -xml output. When imported with XmlSimple.xml_in(), we get a massive hash/array structure that contains all the data elements as either a value in a hash or a item in an array. To find what we're looking for we need a recursive tool to extract the relevant data structures. An example was written in titled lshw_recursion_example.rb the main recursive tool keeps calling the each method of every sub-element (hashes and arrays both have an each method, they behave differently, thus a check for class is needed first).

One snag that was "hacked" in was that if we find the keyword in an array if we push the containing data type, all we get is an array with that keyword. I resolved this by passing the previous data structure as a parameter. If the keyword is found in a hash I store the hash, if it's found in an array, I store the previous hash. I opted to hunt for an list of words instead of a single one. Its more efficient than iterating across the entire structure multiple times for each word. We don't iterate through the words for every element, just the ones that are the termination of a branch. This saves a lot of computation and prevents a few type errors. Its assumed that the word list is significantly smaller than the size of the hash. Some example code:

found = Hash.new()
def hunt (dat,words,found,prev=nil)
                #check the type
		if dat.kind_of?(Array):
			dat.each do |v|
                                #iterate over the current type and check for an instance the words 
				words.each {|w| found.store(w,prev) if /#{w}/.match(v)} if v.kind_of?(String)

                                #recursively call the function on the children of this data structre
                                #note the parent is passed as a parameter as the array branch needs to store the container
				hunt(v,words,found,dat)
			end
		elsif dat.kind_of?(Hash)
			dat.each do |k,v| 
                                #same deal as the array cept we have a key,value combo, and we can store the current data
                                #data structure. We still need to pass the parent as a parameter since we don't know
                                #what type the child is
				words.each {|w| found.store(w,dat) if /#{w}/.match(v)} if v.kind_of?(String)
				hunt(v,words,found,dat) 
			end
		end

11/4/09

I'll need to revisit the use of recursion for lshw. I have some working ideas on how to do it. Ivan suggest multi tier iterations where I hunt for keywords following some kind of "path of keywords". Using the "hunt" multiple times with a sequence of keywords (examining keys as well as values), we should be able to iteratively extract smaller and smaller data structures that contain more relevant information.

More immediately are the changes that need to be made to write to reflect the table structure in the mysql they are:

Need to get mother board id from table matching against serial number
Update node to correlate mother board to location (when they move)
motherboard updates should only modify disk and memory (the mother board id should not change)
If a motherboard is not found then we insert it.
should get node id from sql table matching against location

11/17/09

Modifications on writer have been completed (preliminary checks worked).

reverted Db.hupdate to only update. The calling functions should decide whether to insert or update.
Mb_id nows checks against serial instead of location in the Identify class
update_mb now checks for mb_id. If the ID is present it will update the record otherwise it will insert WITHOUT specifying and ID since SQL should autoincremt the ids
Nodes are uniquely identified by a triple of (node_id, location_id, motherboar_id). Its assumed that the (node_id,location_id) portion is constant. Thus the only change/ update we should check for and preform is to ensure that the motherboard id matches for a given node_id, location_id pair. the update_node function only modifies motherboard_id's

Things that need to be done:

move all the "checks" into the get methods (depreciate the get methods). check() should simply call the sub_check methods and retain a hash of matches for specific data structures (table entries).
update can then iterate that hash and call the respective update functions for each give table.
to that end the update_device method needs to be split in two to reflect the data structures
the data structure design paradigm should be to have one data structure for each table that needs to be checked / update. It's pretty close to this, but needs a little work.

11/23/09

The previous was completed. There was a two bugs that needed tweaking

Update node did not update the inventory when it updated a node info.
Had to add a hack to prevent unknown dev_ids from getting double entered in update_adds when the id is unknown. If the device has multiple instances of a pice of unknown hardware (like 2 new Ethernet cards), the current routine will double add them.
- this hack should be re-visited for efficiency, currently it double checks for a kind (in case one was added after the adds_array was populated). This is very wasteful as the missing kinds should be a rare event. I should probably switch to a different function or something if I've entered the rare "never seen it before" scenario.

11/24/09

Fixed a few bugs from previous edits:

I added a kind check/update that precededs the device update so that kinds are always populated before devices
each update now calls check (except for devices as they're the last). Update mb also repopulates the mb_id.
the mb_id information was moved from Identify to check_sql since it's dynamic and properly belongs there. Identify no longer has a mb_id method.

5/19/2010

Adding Error handling and logging to the writer script:

the require 'logger' + code to actually log Info and errors is now in place
For the Db.connect method I've added a begin/rescue/end block. It sleeps for 60 + rand(60) seconds then tries to connect again, but'll I'll have to fine tune its behavior to only reconnect when It can't reach instead of reconecting every time. it Also logs how long it's going to wait until it tries again.
Jack is stripping things for gatherer but we'll eventually merge that code.
there are 5 places where I need to try and catch exceptions, most of the other exceptions I want to go unhanded so it terminates: I'll need to put in a bunch of debug loggin to make sure my data is what I think it is.

5/22/2010

I've put a bunch of logging and error handling code into the writer version 0.99. It should now log appropriately. To generate a bunch of meaning full results I've taken these steps:

Writer now logs to /tmp/writer.log
Wrote a new script called logcopy.rb
1. It checks the log for errors
2. if errors are found I mount command the tmp directory on repo1
```
repository1:/export/orbit/image/tmp /mnt as my mount
```
3. In the tmp directory is a new directory called logs
4. the /tmp/writer.log file is copied to this logs directory and stamped with the name of the node it came from
I've created a new image called inventoryV2.ndz which has all the updated scripts (writer and logcopy)
I've modified the inventoryV2.rb script to call logcopy as the final step. it's now named inventoryV2-1.rb

6/24/2010

I forgot to log a bunch of changes:

found a bug because I reduced the wait time for the experiment to complete. Since connect retry was rand(60)+(60) my log copies would copy over incomplete files, since it waited only 30 seconds to copy. I've redone the numbers, I wait only rand(20) to attempt to reconnect but now try 3 times. I wait 90 seconds before trying to copy the log file, this should give me enough time to capture the retries.
There was a cascade of query failures due to the fact that I was searching for the testbed id with the short domain name, instead of the FQDN in the test bed table. This value was given to me by the gatherer, I've since modified the gatherer to use the FQDN. All the other queries depended on this number so the error propagated down.
This error however demonstrated a specific flaw in how I handled empty query results. I indicate Failed queries by returning nil, instead of an array. The only reason I caught the error was because I tried to flatten the nil return. I've updated this to raise an exception if a nil query occurs for any of the members of the Identify class. Not being able to identify the node should be a fatal error. This exception is unhandled so it will propagate up to main block where it gets logged and terminates the script.
Also added a little bit of logging to gatherer but not much. I should really fix it's error handeling
Made Check_sql.check_in a public method and had it called last in the MAIN.
Noticed that sometimes I get a xml create object error. I'll have to figure out why that's happening. It's probably due to gatherer not completing properly. But Now I should be able to find the nodes where it happens.
Trying to stick to the logging/exception rasing convention of setting the error text to: "class.method - error"

6/25/2010

We're going live! I'll actively update inventory52.

Few minor tweaks:

Edited logcopy to wait only 30 seconds before unmounting
set default to be inventory52, since this is now going to be our main table.

Since I now checkin last and call it externally. I should include number of changed lines in my checkin. Would be helpful for diagnostics.

6/29/2010

After some thought, I realized that Writer should call logcopy as it's last action. This ensures that log copy copies a complete file. It avoids a timing problem where the Inventory script would have to guess a reasonable time for writer to complete. Logcopy is ensure a properly closed file as writer controls when logcopy is called. I could have put this in a at_Exit block, bit I just left it in the MAIN ensure block. I used the system call:

 system("/usr/bin/ruby /root/logcopy.rb")

Note the explicit paths. The OMF exec subshells don't understand relative paths. I could have used exec, but it replaces the current process with the one to be execed. While this could have worked it would have prematurely terminated writer with out closing out all the objects. That probably should matter, but it's not neat. While checking on the exection I noted that at the point where writer invokes logcopy,

The 3 mains competeing stratgies are system, exec, and %x[]. Where the last one is very similar to back ticks command . I guess there is also psopen and ps3open. System is good enough for these purposes since I only care about it execution, not out put.

I've created a new inventory image and inventory script to reflect this change: james_inv_2.ndz and inventory2-2.rb are a testing pair. They'll replace the last known good pair: james_inv.ndz and inventory2-1.rb

TODO Have logcopy do sanity check for files and replace them

11/16/2010

gatherer version 2 is now in the works. It's essentially a complete rewrite where we standardise on Lshw as the method of getting hardware information. Instead of doing regexp on lspci, lsusb and the like we instead will use lshw. There may be some issues with usb ethernet devices (depending on how lswh classifies them vs how the database treats them.) That'll have to be worked out.

I've revamped the argument processing (useing a standard library) so that it's much cleaner. This is done in the main function body. I'm also using the logger more liberally since this is expected to be a standard. By default logs goto STDOUT, but that can be changed with a flag. It would be nice to make output file path a mandatory argument, but I've not figured out how to do that yet.

The new model is much simpler since we only need lshw. There is a parent class called Component that defines a poll method. Each component should be able to poll for it's data, however the flag (to define what information to get from lshw) and the regexp are uninitialized arguments (no defaults so ther is much bitter complainting if I forget one, by design). I use popen3 to call lshw. This suppresses any extraneous output lshw makes. I also scan std error and raise and exception if I lshw is not found (it's a configurable parameter). Component should not be instantiate-able, instead all other datum should be derived from it, and then provide named accessors to the derived components.

11/19/2010

After trying to build the network class I realized that the common set of things that need to be done by each child does NOT involve searching. Meaning that each child will have to "scrape" the lines a little diffrently. However each child does need to run lshw, and should require it to spit out an array of lines or a "folded" array of lines where each fold occurs at a marker, typically the marker is *-, this is the default method of output. In version 2.08 onward I've replaced the collection of functions in Component(the parent) with a single function, lshw_arr which takes a flag (for the -c argument) and an optional marker. If the marker is specfied I use Array.search and Array.map to fold the array, other wise I just return it straight. All elements of the array are .strip(ed) and .flatten.compacted(ed) to ensure sanity of the return values. I don't want to pass around nested arrays. If something needs nesting I'll make it a new object, but return values should have at most 1 level of nesting.

The lshw webpage is http://ezix.org/project/wiki/HardwareLiSter This page has the device class listings and other documentaion.

12/8/2010

Fixed a few bugs involving how the array slices were processed. I ran into a problem where the device count would get messed up if lshw returned non-unique values. I solved it be reversing the array before slicing (documented in the code).

1/8/2010

I've added the sql_query method to the main component class. All child classes will query sql the same way. The connect and disconncet methods are also component class Methods. I though previously I had added a discussion about various connections models. The two competeing ideas were:

Let each child handle their own connections
Let the main program handle the connection.

I decided to put the connection task in the main function because less connection attempts would be made (this should be more stable) and I can use a single begin/rescue/ensure block to make sure the connection closes.

I've added an abstract method (see here)in the component called update (in the parent it raises a NotImplemented error). This forces all children to have an update method (which should then be implemented).

I used an interesting construction in the sql_query method, thats worth mentioning. If given an array of parameters to glue together, I can string them with out a trailing separator like so A=[A,B,C,D] A.first(A.length-1).map{|a| a + ","} + A.last. This gets the the elements as a string with separators (join might also take a separator argument which might be cleaner, but doesn't work in the case if some elements are nil).

I also started using Array.zip to iterate two arrays with map. If I have A and B arrays and I need to do something to both. I can do C = A.zip(B).map{|x| f(x[0],x[1])}. I'm doing this with the individual data elements that need to be pushed back into the Tables, that way I can just zip the indvidual data arrays into one big one later. See the kind construct.

Finally, the sql_query method has a .flatten in the last statmenet now since the row and query operations return a nested array. I've made it a policy that the query should return a flat array of answers.

1/12/2011

I've made a fundemental mistake in my assumptions. The update method should not just insert things that are missing, it should also check that the data matches what is currently on the database.

I'll need an insert and update method, in component.

1/14/2011

The component sql_insert and sql_update methods have been added. I might reconsider how I pass parameters in the latter since I'd have to fold things into a hash before I passed them up, it might be cleaner to use a 2 arrays and zip them, rather than passing a hash. Now the update method needs to be redone

1/18/2011

I've made a couple of big changes.

there is now a device sub class of component which network and usb should be subclasses of. It implements the get_device_kind method which takes the identifying numbers and search for the device_kind, if the kind is missing, it will insert it into the table and then pull the kind_id back from the table.
- It could be argued that the current implementation of get_device_kind requires too many parameters (vendor, device, desc, bus, inv_id), but I can't think of a cleaner way to pass that information, they're all device dependant.
heavily modified the update method on network class. It pulls an array of arrays (1 for each interface) from sql via an map(sql_query()) call, it then form a similarly structured gathered array (which uses the newly implemented get_device_kind method). I use a collection of zips and maps to compare the arrays and make an array of booleans 1 for each interface. Since I'm using the Array.eql? method to determine if the "data" matches, the match array is type safe because it will always get a bool. Even if the sql data matrix is empty. Using the match array I zip the collected information and then push updates and inserts as necessary.
- to make the checks happen I stick with array most of the way down, and only "cast" to a hash if I need to pass a parameter. I use hashes in both insert and update for uniformity. Really insert could be done properly with 2 array, instead of a hash (thats the way it's used internally any way), but the interface is less cryptic and the hash processing is done with maps and zips (on the keys array) so it not much diffrent. In the update method I actually use the hash structure.

1/20/2011

I've done away with the syspath method for getting vendor information. Instead I used lshw -numeric. I've corrected how the cpu's get enumerated and counted. On the quad core cpu the fold count was 20 (massive duplication), I filter by looking for a size attribute (orignally serial number, but that isn't consisten across platforms). I had to add some unit conversions to be able to push data into the mysql table, but right now it busted. I'll need to rethink how it's getting stored/converted.

The update method in network has an error, it's comparing inventory ids, and incorrectly updating if those don't match. I'll need to drop them out of the set of things to compare. Had to modify the storage types in motherboard, to be bigints so I could store the converted numbers, and then pull them out the same way.

1/25/2011

The script is mostly done. I've added an update method to the USB class (a subclass of device). There were a few sanity check tweaks that had to be added (empty vendor string, disks with no size e.g. cdroms). That being said the omf-5.2 setup required a complete rewrite of the calling/executing script. The newst version is now named inventory-5.2.rb. The checking/inserting/updating function of the update method remains mostly the same as the network, however I've trimmed the header array a bit since I don't have as much information for Usb as I do for network.

All the non hardware specific infromation has been pulled into a new System class (node_id, inventory_id, location_id, etc…). It's a subclass of component (most for mysql stuff), but doesn't require any "external" refrences.

The omf-5.2 problems are still being worked on but once those are ironed out we can call the "script" portion done. The final step requires moving the functionality to a pxeimage, and then tweaking the "load" expirment to become the "inventory" expirment (by replacing frisbee lines with ruby /gatherer.rb lines).

I think version 2.23 is feature complete (with the exception of dreaming up a way to figure out usb devices mac addres, and making some of the other flags e.g. xml, file, etc.. ) do something

In the inventory script I call gatherer.rb with a -d -l /tmp/gatherer.log so I can check the log if any thing goes wrong. Since We're using the def application facility, if the script throw any errors, the framework captures them and brings them back to console.

4/27/2011

I've switched to git to do version tracking from Version 2.23 up: There were a couple of tweaks to the way things work, I've catalogued them in git commit messages which are enumerated here:

c9afc57806b835f72ce7b99fc2190e2641034126 some documentation changes, and some addtional error output
d229e4ce5d1c31d432667f5cfc24a3ae4053e357 minor tweak for nil mb_sn entries, Since I can't get it some times, I just push a string with the HD serial duplicated
1182052c78f98b1c0eec4755f16bc31d166e780a cleaner implemention of the fqdn checks
8ca9e15bd53fad4e06b32ff1afb8f9d4ac00f3dd fixed the fqdn to try diffrent portions of the domain. This should be functional on inventory2 and 52 now
d9ef189f2919f987abb9cd55bbe258054bd09244 pushing in what ever I fixed, going to add fqdn support
60c9512abbfff8db6bd046523a26788a4e3df379 fixed a sql_query bug, where it should return an empty array if zero results were collected.
2eb417654e1da36a55beec7fe8ac8ddd74e74c06 Changed the way we identify network devies to use MBID and bus address instead of mac, this should be more forgiving to the pxe image
8cf565b6a34002a31a9cc1835e244ab3582e57ad made the script more tolerant of missing information so that it works on the pxe image, and a few bug fixes
dde0a2ce9cdf55888e691193bdbea4fc6a3cac07 Fixed tool tips, and corrected Motherboard.updates code, preforms less checks of @UUID
2a5d937cf0b46fe056fe0a3b8ac5e0e71968ed35 Fixed Motherboard.update because the sql matches were breaking
751f9c2794cbc9b87f712703ceb0d5062d366031 Fixed motherboard.update/get_mb_id to deal with missing UUID
984a36ce4b73c65d44e5454cb2023540ce4eb4fd modified the get_location_id to include console information
d587df7477f4f6e23d84c1a3a5d7f31ae462e974 Initial commit, version 2.23

The core operation remains there same, however there were a few major revisions for functionality they are:

changed the way sql_query returns to always return an array
when checking for loc_id, I split the fqdn into strings and then try each string
Instead of pushing an empty string in for MB_serial when I don't have it, I instead push a string that uses the Disk serial to make them distinct
if I can't find the motherbard serial, I use the Disk serial instead to get the motherboard Id

Most of tweaks and adjustments were done to accommodate the new method of inventorying. Instead of imaging the nodes then running inventory as an experiment, we've moved the script to a initramfs image, and then boot into a pxe image and run the inventory from this memory initramfs. The snag is that this kernel/initramfs image doesn't load a full compliment of drivers so some of the information that would be gathered and populated, doesn't get collected (e.g. mother board serial number). That was the bulk of moidifcations. We're currently live now and running the updates agains the working inventotry52 database (which is read by omf-5.2).

The only tweak I had to do with the Tables was to modify the memory, cpu speed and disk size data types. I changed them to floats, it makes the internal representation a little cleaner. Thus I had to add a convert to float string function in the data handler portion, It's part of the Component class.

Note: The current inventory command assumes that you want to do all nodes, you can invoke it with a flag to change the node set. It's syntax is not intuitive.

All nodes:
root@console:/root# omf-5.2 exec inventoryNode.rb

Nodes [3,1..12]:
omf-5.2 exec inventoryNode.rb -- --nodes [3,1..12]
NOTE THE LACK OF AN = sign

I may branch off a threaded version to make it a little faster but that might right into blocking issues with concurrent runs of lshw. Also on the Horizon is the idea of using a full Diskless ubuntu server pxe image to house the inventory process. This should resolve some of the missing info/driver issues.

1/10/2013

This project has essentially been completely rewritten from the ground up to accommodate the new data base type described here. Most of the regular expression and tools are the same but the data base interface is completely revamped (separate library). We collect fewer pieces of data, but each piece is more meaningful, and their logical context now easier to ascertain from the db structure. The central identifier is now the fqdn of the node (this establishes it position as well as it's heirarchy), the old "context-less unique identifier" is long gone.

To launch the process we now use a start up script instead of the old oml construct. This means the process to run inventory is as simple as:

Image nodes with inventory image
restart nodes

While this method is effective, it has the draw back that each time we update gatherer, a new image has to be created. This is not so big a deal because gatherer updates come so infrequently, that the image usually ends up having to be updated any way. The inventory start up script looks like

#!/bin/bash
echo "Sleeping 30" > /tmp/inventory.log
sleep 30
let amt=$RANDOM%10 
echo "Sleep $amt" >> /tmp/inventory.log
sleep $amt 
echo "NTP info" >> /tmp/inventory.log
ntpq --numeric --peers >> /tmp/inventory.log
echo "Starting gatherer" >> /tmp/inventory.log
ruby -I /root/ /root/gatherer.rb -d  -l /tmp/gatherer.log >> /tmp/inventory.log
echo "Done" >> /tmp/inventory.log

Add this file to /etc/init.d and then run

update-rc.d inventory.sh defaults 99

The out put should look like:

oot@node1-1:~# update-rc.d inventory.sh defaults 99
update-rc.d: warning: /etc/init.d/inventory.sh missing LSB information
update-rc.d: see <http://wiki.debian.org/LSBInitScripts>
 Adding system startup for /etc/init.d/inventory.sh ...
   /etc/rc0.d/K99inventory.sh -> ../init.d/inventory.sh
   /etc/rc1.d/K99inventory.sh -> ../init.d/inventory.sh
   /etc/rc6.d/K99inventory.sh -> ../init.d/inventory.sh
   /etc/rc2.d/S99inventory.sh -> ../init.d/inventory.sh
   /etc/rc3.d/S99inventory.sh -> ../init.d/inventory.sh
   /etc/rc4.d/S99inventory.sh -> ../init.d/inventory.sh
   /etc/rc5.d/S99inventory.sh -> ../init.d/inventory.sh

Note: For full enumeration of all devices on a baseline image, make sure you remove the modprobe blacklists. If you don't the wifi modules won't get enumerated and the gathering won't find them.

7/9/2014

Seems I missed a few details in the last round of revisions:

Current modifications include:

detection of UHD
- Done by scraping the output of uhd_usrp_probe
detection of NETFPGA
- Tries to load the netfpaga module and enumerate the interfaces
reporting of Hard drive smart information
- scrapes smartctl -a

Additional support was proposed for setting the system clock via NTP but that was relgated to an OS startup script (we want to do it every time the node boots).

Had to make some modification for to play nice with ruby 1.9.2+, specfically:

#!/usr/bin/ruby -w
# gatherer.rb version 3.9 - Gathers information about varius system data, and updates the web based inventory via a Rest wrapper.
#
#Smart support, upadted restdb interface with time out support

require 'optparse'
require 'open3'
require 'find'
require 'singleton'
require 'net/smtp'

require_relative('./rest_db')
require_relative('./log_wrap')

the #! directive had to change and the local libraries need to be included with require_relative. This was done in gather and rest_db.

Last modified 10 years ago Last modified on Jul 9, 2014, 7:49:31 PM

Note: See TracWiki for help on using the wiki.

Download in other formats:

Plain Text