Loading...

Optimizing Your SAS Performance with Grid Computing

ABSTRACT
Google has become extremely successful by developing an efficient search engine running on service hardware.  It no longer uses the old model of putting all its resource onto one super computer, but rather it spreads that processing onto a cluster of smaller machines running in parallel to form a grid.  Gordon Moore made an observation in 1965 predicting that the number of transistors per square inch used on computers would double every year.  This trend has become law and continues to elevate the ubiquitous and moderately inexpensive desktop and laptop computers.  This paper will discuss how you can cluster computers in a grid to optimize the execution of SAS programs.  Some of the techniques discussed include:

• Implementing supercomputer power with commodity hardware
• submit SAS programs sequentially while maintaining inter program dependency
• Threading multiple groups of programs for optimal performance
• Measuring SAS performance with Statmark a standard metric for a traverse platform benchmarking for SAS processing
• Scheduling the execution of programs in a grid environment

In the world of Mooreʹs law, it makes less sense to lay out large capital investment for a server.  Clustering inexpensive smaller machines and dynamically adding new computers to this architecture within a grid can scale your SAS computing resources to become the Google of search engines.

INTRODUCTION
In the space of analytics as statistical models get more sophisticated and the datasets gets larger, computing resources is much needed engine that delivers results.  SAS has evolved the length of with hardware systems to utilize the horse power needed to crunch the statistical models and data manipulations.  When I first in progress working with SAS, it was on a main frame computer system running TSO.  This was centrally controlled with very limited user customization from a dumb terminal.  As computing chips got smaller, the processing of SAS started to move toward smaller UNIX servers.  Then the introduction of SAS on personal computers dramatically changed how most users performed their data exploration.  Users were trying out their data models and reports on their PCs, although they still executed things on a networked server for production jobs.  This evolution is continuing as the desktop is becoming more powerful.  With maturing technologies used to connect these desktop computers, PC desktops are beginning to form computing grids that can outperform the traditional servers.  The forces that drive this include the shrinking size and cost of computer chips while performance is increasing.  This is coupled with the lowering cost of memory and storage.  These combined elements supply analytical tools such as SAS with greater abundance of computing resources.  We are at a juncture in this evolutionary stage where the ways the computing resources are utilized can be more important than just obtaining the resources.   

IT managers need to evaluate the cost of the lifetime of a server since the price to performance ratio of the computing resources would diminish over time.  It is similar to purchasing a car in that the performance of the car does not go any faster but the value of the car is constantly going down.  Computing resources have an even lower return on investment in that they become obsolete very quickly as the next model is usually cheaper, yet outperforms the current server model.  It is therefore not always prudent to put out large capital expenditures on a piece of hardware when its performance to price ratio will diminish in such short spans of time.  Grid computing offers a different model in that commodity hardware can be expensed with less cost.  There is better flexibility in that the grid can scale to match the presentation of a growing group without necessarily throwing out the old server for replacement of the new.  Nodes can be added and older nodes can be taken off like a living creature shedding dead skin.  In the Grid, the newer nodes have the advantage of obtaining the best ever computing power for the cost at that time.  This spreading out of the capital expenses on computing resources is analogous to the time valued benefits of spreading out your investments and investing small amounts over your lifetime to form a balanced portfolio instead of putting one big sum investment into a single stock.  It acts as a buffer towards the ups and downs of the markets.  In this case, it is not the financial market but rather the market of computing hardware cost.  As hardware costs continue to get cheaper per price performance, the cost of software seems to get more expensive as the complexity of the software increases.  Licensing SAS is not cheap so it is wise to optimize the hardware which SAS runs on since over time, the hardware cost will be a little bit compared to the software cost.

One of the key components in the optimization of computing cost is the ability to measure with precision the performance of your system.  This metric can help you evaluate the return on your investment.  Without any form of measurement, it is like shopping for a credit card without having the ability to know what the APR or interest rates are.  This paper will opening a free utility called Statmark by MXI that will allow you the tools to make the right decision in hardware implementations. SAS Institute has had the technology to run its jobs on remote machines for many years with SAS/Connect.  It utilizes protocols such as TCP/IP to connect to a remote machine and have your program run remotely.  SAS Grid computing leverages this along with other software such as the Grid Manager to optimize the performance of SAS on multiple nodes to optimize the computing resources within a grid.

Alternatively, MXI also has an explanation Clustreion which executes SAS programs within Grid architecture.  This paper will talk about the use of the grid computing environment to help you optimize your computing environment so you can optimize the use of your hardware.  In a computing environment of analytics that is resource intensive; it is wise to optimize your presentation due to the dynamic environment with withdrawing returns on your hardware investments.
Architecture Topology of SAS Grid
VirtualNuggets 667882466256632689

Post a Comment

emo-but-icon

Home item

Blog Archive

Popular Posts

Random Posts

Flickr Photo