diff --git a/s5demo.txt b/s5demo.txt new file mode 100755 index 0000000..bc6dd3c --- /dev/null +++ b/s5demo.txt @@ -0,0 +1,293 @@ +.. title:: + s5 DEMO!!!! - RandomCo Thoughts + +.. footer:: + $Id: s5demo.txt,v 1.1 2006/04/05 17:11:27 tundra Exp $ + + +.. contents:: + +Meeting Mechanics +----------------- + +- It would be best if we could get all the way through the presentation + in "Read Only" mode. This will allow you to understand the entire + analysis in context. This will be followed by an open-ended QA session. + +- Ed Murphy will present our findings using interprative dance. + + +Possible Responses +------------------ + +- As you hear us present today, you will have one of three responses: + + - "Oh, I/we already know that." + - "Hmm, that's new to me - good point." + - "I thoroughly disagree with what you just said." + + Each of these are important ways that you will validate and/or act upon + our findings. + +Assumptions +----------- + +- A lot of baseline data such as machine inventories, levels of utilization, + application response time, and arrival rate profiles was either missing, + incomplete, or inconsistent across systems. Some data (such as pricing) + was completely unavailable to us. We've thus built the models using + estimates for certain critical datapoints. RandomCo can update these models + at will to make use of "real" data and thereby get better model output. + + +Key Findings +------------ + +- The RandomCo IT culture is end-user/project focused to a fault. + Infrastructure is (mostly) enhanced incrementally and is not treated + as a common asset designed to serve the entire breadth of + applications. + +- This has led to an overprovisioning of some classes of servers + (Windows) and a greater variety of system types (Unix) than is + strictly necessary. + +- Operational disciplines such as asset inventory control, measurement, + and reporting vary greatly in depth and quality. This makes it hard + to manage what is not consistently measured. + +- Business, Architecture, Development, Infrastructure, and + Operations are not bound together with a common overarching view of + IT at RandomCo. There is a tendency for each of these to operate + moreso as silos. The Architecture team tends to have the broadest + view of these issues. + +- Customizations to key business subsystems such as SAP and Manugistics + are creating a high degree of operational complexity that may + not be justified. + +- The strict commitment to a Windows/.NET-only development environment + is unnecessarily constraining the organization's agility, time-to-market, + and ability to control costs. + + +Core Themes +----------- + +- Infrastructure provisioning needs to move from a project-centric + model to an Enterprise-wide service model. + + 1) This will maximize reusability of extant infrastructure assets. + + 2) This will enable a systemic perspective for provisioning new + infrastructure with attendant economies of scale. + + 3) The current complexity, variety, and underutilization of systems + at RandomCo is a direct consequence of making project-based + infrastructure decisions. No amount of migration/consolidation + will make a permanent difference if the underlying root cause + practice that caused the situtation in the first place is not + addressed. + +- Measurement, Monitoring, and Management need to be made more + consistent and reach more widely across the IT operational + environment: + + 1) Basic asset information such as server inventory, software + revision levels, machine age, and so on varies widely by + platform. This makes business case cost calculations for new + initiatives difficult, and in some cases, impossible. + + 2) Today there is wide variation in the depth and quality of + performance and capacity metrics available across all the + datacenter assets. This makes tuning and capacity planning a + vertical, per-server activity (if at all), rather than a + systemic infrastructure concern. + + 3) In short, "You Cannot Manage What You Do Not Measure." + +- RandomCo today is already making good use of virtualization + in the "Big Box" Unix and Mainframe areas. This needs to be + extended to the Windows-class servers as well. + + 1) This will allow more efficient use of existing server capacity. + + 2) This will enable rapid resource (re)provisioning on a project + or even perhaps, event, basis. + + 3) This will decouple applications software from underlying + operating environments by testing and certifying the application + to the *virtual OS*, not the physical hardware. This will + materially reduce the retesting burden currently incurred when + hardware is upgraded or changed. + +- RandomCo should begin the necessary steps to reduce the number of + different Unix variants within the IT organization and reduce its + total dependence on Windows as a server platform. Wherever + possible, these should be migrated to SLES Linux across the required + breadth of hardware. RandomCo will benefit in doing so because: + + 1) This creates a common operational platform thereby reducing + training cost and maximally leveraging the employee skill set. + + 2) This make the organization hardware-agnostic thereby providing + negotiation leverage with the hardware vendors. + + 3) The net software licensing cost should drop significantly: + + a) RandomCo already has an Enterprise License for SLES. + + b) SLES will be bundled with XEN virtualization in future + releases. This should be considerably less expensive than the + separate licensing of Windows and VMWare on today's servers. + + 4) The first candidate for elimination is AIX. + + +- RandomCo needs to embrace Linux as a development platform for its own + customized software: + + 1) This will give it many more degrees of freedom in how it designs, + deploys, and operates its own applications. + + 2) This will open the door to cost reduction by replacing + expensive enabling components (like IIS) with free or very + inexpensive open source equivalents (like Apache). + + 3) This will enable "scale" at the *organizational* level. Today, + there is a significant difference in worldview, skillset, and + approach between the Windows developers and the rest of the + RandomCo IT community. By moving to make Linux one of the common + development platforms, RandomCo will open the door to having the + in-house applications it develops run on everything from an + entry-level machine through an Enterprise-class mainframe. This + will be done with a common set of development tools, + technologies, and *people* across the organization, with a far + stronger alignment between Architecture, Development, and + Operations. + + +Specific Technical Recommendations +---------------------------------- + +- There are a number areas for improvement that are "Quit Hits". + These are relatively low risk/ low complexity and can be acted + upon fairly quickly: + + 1) Audit all printers and replace any that are still using PC + print server hosts with direct network connected printers. + + 2) Migrate the datacenter core LAN fabric from 100 BaseT to + Gigabit ethernet everywhere. + + 3) Build out the datacenter switch topology to accommodate more + ports, be 1G capable, and accommodate future growth. Get rid + of the daisy-chained switches used today. + + 4) Build an IP-connected NAS in the datacenter and migrate all the + corporate file servers away from locally attached storage to + the NAS to provide consolidated storage, backup, management, & + recovery. (It may be the case that it is easier/more consistent + to actually mount this on the existing SAN and expand the + SAN capacity accordingly.) + + 5) Continue/accelerate the path to virtualizing Dev/Test/QA + images. BUT, place the provisioning of these images into the + hands of infrastructure organization, not by each and every + disparate development project. + +- The QIP DNS infrastructure needs to be audited: + + 1) Revisit the overall Enterprise DNS architecture and make sure + it still makes sense. + + 2) Ensure that the versions of 'bind' and 'dhcpd' deployed in QIP + are new enough to overcome the known security holes of the + older versions of these tools. + + 3) The competitive landscape should be revisited here to see if there + a better/newer/cheaper integrated DNS solutions. + + 4) Examine the possibility of augmenting standard "bare" 'named' and + 'dhcpd' with open source or commercial DNS/DHCP configuration tools. + +- The Windows server farm provides a strong opportunity for *consolidation*: + + 1) Many machines are lightly utilized and thus can be consolidated + via virtualization. + + 2) The data on Windows server utilization is spotty at best. Instead + of attempting to analytically determine which servers to virtualize, + do so *empirically*, as follows: + + a) Select the servers that today represent the least powerful + 20% Windows servers. + + b) Begin adding servers from that 20% virtually to a target + machine *while monitoring utilization*. When the machine + hosting the virtual server images reaches some threshold + average utilization (we suggest 75%), consider it "full", + and start adding virtual servers to the next physical machine. + + c) Over time, you will discover what a reasonable average + level of utilization is for the machines hosting the virtual + servers and thus how many such virtual images a given class of + hardware can support. (The business case assumes consolidation + ratios of 3:1, 2:1, and 1:1 for small, medium, and large class + servers respectively.) + +- The Unix server farm offers some opportunities for *migration*: + + 1) SAP needs to be migrated to run on Linux instead of AIX. + + 2) The various flavors of Oracle currently in use need to be + migrated to a high-availability Oracle RAC environment, running + on Linux on either the existing Z-Series mainframe or a new + farm of purpose-built Linux servers. + +- The Unix server farm offers some slight opportunity for *consolidation*: + + + +- There is a meaningful opportunity for Linux/Open Source in the Retail + Store environment: + + + +- A detailed analysis/audit of the core FLEX pricing algorithms needs + to be undertaken to determine whether more hardware or better + algorithms (or both) can be brought to bear: + + 1) Need to determine the nature of the computational contraint. + + 2) We suspect the problem being solved is "NP-Complete". If so, + there needs to be an investigation of improving/introducing + bounding heuristics to improve computation speed. + + +Other Scenarios Considered +-------------------------- + +- Retail Store Server Consolidation + + 1) We examined the possibility of collapsing the 4 servers currently + used in each store into 2 larger servers. + + 2) This scenario is currently a nonstarter because new hardware is + still being rolled out to the stores this year. The cost recovery + thus isn't there to justify a store server consoldiation. + + +Major Risks +----------- + +- The absence of "before"SLA metrics means that any consolidations or + other changes made to the system may get blamed for subsequently + seen "poor" performance. When this happens there is no way to + compare the "after" to the "before" conditions. Senior management + needs to understand this and be prepared to manage through it. + + + +