Running a software company is extremely challenging. All, even successful companies, constantly struggle to keep the balance between finding the right product for the right customer, keeping up with changing market demand, bringing in the money, maintaining product quality, controlling company evolution speed and many many other conditions for success. As a result, 99% of them close within first few years.
What defines a company as successful?
This work is a set of qualities that contribute to company success, based on industry's best practices. It is structured around organization departments' while focusing on simplicity, standards, information accessibility, and monitoring.
Improve rate of company's net worth increase and protect it from undesired closure.
Go over the items in the list and pick the ones you believe are vital for your company. Tick the "Save" checkbox near those items, you will be able to print them (or save to PDF) later, by clicking the " Print" button in the right sidebar.
If you are unsure what some item means - use your favorite search engine and other means to do proper research. Refer to Terminology to have a better understanding of terms used. If an item is out of place, outdated, poorly described or misspelled - improve it. See contributing.
Having the list of potential action items, take some time for research. Consult with people you work with and plan how to implement the change. Always remember that the goal of your company is to make money, as much and as fast as possible - do avoid from being distracted with ideas that do not help the company to achieve its goal.
Implement the change and measure success.
When the changes are implemented, there should be a noticeable in company's performance. The rate of making money should increase, inventory and operational expenses should decrease. This is the most desirable outcome. If it didn't happen - try to understand why. Did you measure things properly? What other changes were introduced in parallel to yours?
Share the results with me (leonid@komarovsky.info), I will be glad to know about your challenges and progress.
TBD
This work is licensed under a CC Attribution-NonCommercial-NoDerivatives 4.0 International License.
This page provides text annotations to enable public discussion and personal note-taking (a great service by hypothes.is). To start a new discussion, simply select any text on the page and click "Annotate." I am periodically summarizing public annotations and updating this document with relevant content.
You can also contact me directly at leonid@komarovsky.info with any questions or ideas you want to discuss personally.
Know someone who might need to use this work? Share the link with them.
Please donate to help improving and maintaining this project. Here are some ideas for features that could be added:
Operations Department is responsible for defining and controlling cross-company processes focused on increasing the throughput of making money, while simultaneously reducing both inventory and operating expense.
Save | |
---|---|
Performance of every group in the organizational hierarchy is measured by a set of Key Performance Indicators (KPIs) | |
Key Performance Indicators of every group in the organizational hierarchy are aligned with company's objectives | |
Values of Key Performance Indicators of every group in the organizational hierarchy are visible to the group members (for example wall mounted displays showing KPIs) | |
Key Performance Indicators of every group in the organizational hierarchy are periodically reevaluated and updated | |
The amount of Key Performance Indicators per group in the organizational hierarchy is between 3 and 7 | |
Both leading and lagging Key Performance Indicators of every group in the organizational hierarchy are monitored |
Save | |
---|---|
Work planning is focused on a predetermined clear strategy, aligned with business goals and time/resource constraints | |
Work is planned with long-term thinking in mind | |
Work is planned for 2-4 weeks ahead | |
At the end of the planning process, priorities, objectives, and responsibilities on a personal level are clear | |
Planned changes are divided into small, incremental changes that can be completed in a week or less | |
Work backlog is small |
Save | |
---|---|
Work progress, required effort, and product quality are monitored on a task level | |
All work is categorized as "planned", "unplanned" and "abandoned" | |
The amount of unplanned work is monitored and continually reduced | |
The amount of abandoned work is monitored and continually reduced | |
The amount of work in progress (WIP) is monitored and limited | |
Work processes and procedures are clear to management and employees | |
Work processes and procedures are documented | |
Work processes and procedures are constantly reevaluated and improved | |
Work processes and procedures are actively automated | |
Work processes and procedures do not slow people from achieving great work |
Save | |
---|---|
There is a centralized Instant Messaging service, used for daily communications throughout the organization | |
The Instant Messaging service provides search within conversation history | |
It is possible to create topic-based communication channels in the Instant Messaging service | |
Important notifications such as monitoring alerts are immediately communicated via Instant Messaging service | |
It is possible to actively handle incidents using Instant Messaging service (ChatOps) | |
Critical services support in-service communications allowing to discuss service-specific items without leaving the service |
Save | |
---|---|
Knowledge sharing meetings are performed periodically | |
Technical specifications of every product, such as requirements, architecture, and technologies used, are easily accessible to members from other teams | |
There is an easy-to-use directory of experts, including their expertise and contact information | |
There is a centralized Knowledge base | |
Any knowledge possibly reusable by company employees is stored in the Knowledge base | |
The Knowledge base is periodically reorganized and is kept up to date | |
Every team has a person responsible for Knowledge management | |
There is a dedicated team responsible for Knowledge management | |
Knowledge management is monitored |
Save | |
---|---|
Every team consists of 5 to 8 members | |
Every team consists of members with different roles allowing to plan, develop, build, test, deploy and monitor software and infrastructure changes within the same team | |
Induction process for a new team member is shorter than two weeks | |
There are dedicated teams that provide and manage internal, cross-company tools |
Save | |
---|---|
TBD |
Save | |
---|---|
Percentage of done out of planned work is monitored | |
Work progress, effort, and quality are monitored | |
The ratio of unplanned to planned work is monitored | |
The amount of work in progress (WIP) is monitored | |
The amount of abandoned work is monitored | |
The amount of meetings for every employee is measured and analyzed on a weekly basis | |
"Vanity metrics" such as lines of code produced and functions created are considered counterproductive and are NOT monitored | |
Competition monitoring such as team leaderboards is considered counterproductive and is NOT used |
Human Resources Department is responsible for recruiting skilled and motivated staff and maximizing its performance.
Save | |
---|---|
TBD |
Save | |
---|---|
Automated surveys collecting actionable information from employees are conducted periodically | |
Employee morale is monitored | |
Employee job satisfaction is monitored | |
Employee motivation is monitored | |
Employee turnover is monitored |
Marketing Department is responsible for identifying and profitably satisfying customer's demand.
Save | |
---|---|
TBD |
Finance Department is responsible for providing financial insights, vital for current company's wellness, and ability to make future strategical decisions.
Save | |
---|---|
TBD |
Information Technology Department is responsible for providing technological infrastructure to support business operations.
Save | |
---|---|
It is possible to assess cost effectiveness, usage, resource utilization, performance, quality per component, application, module |
Save | |
---|---|
Logs, metrics, data dumps, screenshots and other potentially important qualitative and quantitative data is collected | |
Collected data is stored in a centralized system | |
There are clear retention policies for collected data | |
The quality of collected data is periodically reviewed and improved | |
The log format is standardized across the company | |
Collected data always contains the time, origin and a descriptive message | |
Metric naming is standardized across the company | |
Metrics are periodically reviewed for correctness and relevance |
Save | |
---|---|
It is possible to query collected data | |
It is possible to transform, combine, and perform computations on collected quantitative data | |
It is possible to perform trend analysis, including trend prediction on collected quantitative data | |
It is possible to visualize data queries using graphs, diagrams, tables and maps | |
It is possible to create custom dashboards with visualizations of data queries | |
It is possible to create scheduled reports based on data queries | |
Collected data is analyzed and translated into informative monitoring system events | |
Monitoring system events contain as much contextual information as possible | |
Events with contextual relationship are grouped into higher level events | |
It is possible to generate monitoring system events manually using easy-to-use web interfaces and APIs | |
The quality of visualizations, dashboards, and monitoring system events is periodically reviewed and improved |
Save | |
---|---|
Alerts are created for monitoring system events | |
Alerts are created only for actionable events | |
It is possible to prevent alerts of lower level from being sent if a higher level alert was already sent. - rephrase | |
Incidents are always recorded and analyzed | |
Incident handling processes are documented | |
Incident reports always contain issue summary, timeline, root cause, resolution and recovery, and corrective and preventative measures | |
Alerts contain reference to documentation explaining how to handle the incident | |
Escalation plans are documented | |
Alerts unacknowledged within expected time are automatically escalated according to escalation plans | |
Automated remediation is performed only for issues that the company does not have control over, such as failing hardware of external datacenter | |
The quality of alerts and escalation plans is periodically reviewed and improved |
Save | |
---|---|
There is an external, independent system that is monitoring Monitoring System's health | |
When the Monitoring System is experiencing issues, management and team members are notified immediately | |
When the Monitoring System is out of service, standalone monitoring for critical components is working, and results are being recorded locally | |
When the Monitoring System is out of service, it is possible to perform a failover to an alternative Monitoring System |
Save | |
---|---|
TBD |
Save | |
---|---|
Rate of contributions to the centralized Knowledge base is monitored | |
Rate of knowledge sharing sessions is monitored |
Save | |
---|---|
Lead time for release is monitored | |
The release rate is monitored | |
Time to restore service is monitored | |
Release failure rate is monitored | |
Infrastructure state drift between coded and actual is monitored |
Save | |
---|---|
TBD |
Save | |
---|---|
Percentage of automated tests is monitored | |
Test execution times are monitored for all test levels and types | |
Test efficiency is monitored to detect false-positive or inefficient tests | |
Test usage is monitored to detect unused tests |
Save | |
---|---|
There is a centralized System Monitoring System | |
There is a clear definition of system components, critical to the organization’s operational well-being | |
Key requirements for system's availability, stability, performance, throughput, and security indicators are defined and documented | |
Resource usage, state and health of every process are monitored | |
Resource usage, state and health of every host are monitored | |
Resource usage, state and health of every system component are monitored | |
Resource usage, state and health of the infrastructure are monitored | |
There is a documented dependency map allowing to understand how any failure affects the rest of the system | |
It is possible to identify system issues using a bottom-up approach, starting at the process level | |
It is possible to identify system issues using a top-down approach, starting at system component level | |
Software license compliance is monitored | |
Expiry dates of software licenses are monitored | |
Expiry dates of domain name registrations are monitored | |
Expiry dates of SSL certificates are monitored | |
Hosts and applications have one of the following states: In-service (ex. OK), Unknown, Out-of-service (ex. critical), Some-issues (ex. warning), Recovered, Unstable (ex. Flapping) |
Save | |
---|---|
There is a centralized easy-to-use Network Monitoring System | |
There is an up-to-date, easily accessible inventory of hosts and network equipment | |
It is possible to review the history of inventory changes | |
Hosts and network equipment of the entire network are discovered automatically | |
Network topology is discovered using network tomography and SNMP or Route analytics | |
There is a graphical representation of the network | |
It is possible to arrange hosts and network equipment into user-defined logical groups | |
It is possible to define dependencies between groups of hosts and network equipment | |
It is possible to separate network and application issues | |
It is possible to detect the specific layer of OSI model that the issue happened on | |
Network performance is monitored | |
Network availability is monitored | |
Network uptime is monitored | |
Network reliability is monitored |
Save | |
---|---|
There is a centralized easy-to-use Application Performance Management System | |
It is possible to record and replay sessions of user activity | |
Alerts about application performance and the end-user experience are actionable | |
There are automated remediation processes triggered in response to application performance issues | |
It is possible to identify any system changes such as new releases or insufficient system resources, responsible for exceptional behavior |
Save | |
---|---|
It is possible to identify issues experienced by end-user, using top-down approach | |
The flow of user actions and responses to them across all system's components is logged | |
User interactions with the system are grouped by sessions, allowing to trace and analyze only relevant events | |
Slow responses to user actions are identified | |
Rate and percentage of slow responses to user actions are monitored | |
Unexpected responses to user actions are identified | |
Rate and percentage of unexpected responses to user actions are monitored |
Save | |
---|---|
Business transactions are clearly identified and documented | |
It is possible to search for business transactions based on context and content such as time of arrival or transaction type | |
Slow business transactions are identified | |
Rate and percentage of slow business transactions are monitored | |
Unsuccessful business transactions are identified | |
Rate and percentage of unsuccessful business transactions are monitored | |
Suspicious business transactions are identified | |
Synthetic end-user transactions are clearly defined and documented | |
Synthetic end-user transactions are performed periodically |
Save | |
---|---|
It is possible to identify application performance issues using bottom-up approach, starting at system component level | |
Servers, networks, storage, applications and services within the environment are automatically discovered by Application Discovery and Dependency Mapping tools (ADDM) | |
Transactions and applications are automatically mapped to underlying infrastructure components | |
Servers, networks, storage, applications and services within the environment have up/down state monitoring |
Save | |
---|---|
It is possible to identify application performance issues using bottom-up approach, starting at the code level | |
Call stack of application code execution and the timing associated with each method are recorded and monitored | |
Communications with services external to each module are recorded and monitored |
Save | |
---|---|
It is easy to correlate application performance data from various sources to provide actionable information | |
Simple reports with application performance information are sent periodically to stakeholders and team members. |
Save | |
---|---|
TBD |
Save | |
---|---|
TBD |
Save | |
---|---|
TBD |
Save | |
---|---|
Security is a priority in the organization | |
There is a thorough, up-to-date documentation of authentication and authorization mechanisms, network architecture, storage and hardware access | |
Security-related documentation access is restricted and audited | |
There are complete trust and transparency between people responsible for the development, operations, and security | |
Probability and impact of security risks are clear to everyone | |
Incremental improvements are preferred to following a detailed security roadmap | |
Security practices improve on each step of the Delivery Pipeline | |
Third-party software is standardized | |
Automated audit trails are implemented across all systems | |
Preparedness is tested with Security Games | |
Security hardening process is not slowing down the pace of business activities | |
Automation of security processes is of high priority | |
Security reviews are conducted periodically | |
Threat modeling and risk assessment are conducted periodically | |
All IT and R&D employees are immediately notified about any security vulnerabilities detected in the system |
Save | |
---|---|
Definitions of users, groups, roles, and privileges are stored in Source Control | |
Management of users, groups, roles, and privileges is performed by explicitly using code stored in Source Control | |
It is possible to rollback definitions of users, groups, roles and privileges to a known good state in response to any detected aberrations | |
All users, groups, roles and privileges are carefully discussed and designated to resources on a need-to-know basis | |
The practice of assigning the least-privilege model of access is applied whenever possible | |
Any privileged accounts are closely monitored for changes |
Save | |
---|---|
There is an always-up-to-date inventory of hardware, software and information assets | |
New assets are discovered automatically within minutes | |
It is easy to determine the team or person that are responsible for any asset | |
Changes in existing assets are validated as soon as they appear in the inventory | |
Any aberrations are automatically communicated to the responsible team or person | |
It is possible to rollback any inventory item to a known good state in response to any detected aberrations |
Save | |
---|---|
Configuration Management System is continuously applying configuration standards to new systems and enforce the configuration to systems that deviate from those standards | |
There is an easily accessible catalogue of "Golden Images" with predefined core functionality, such as identity management, configuration management, secrets- as-a-service and audit |
Save | |
---|---|
TBD Encrypted communication channels |
Save | |
---|---|
Logs and events generated by services, applications and operating systems are automatically collected and sent to a central platform | |
Logs and events affecting data are automatically collected and sent to a central platform | |
Logs and events generated by services, applications and operating systems are closely monitored with Security Information and Event Management (SIEM) tools | |
It is possible to rollback the system to a known good state as a response to any aberration detected with Security Information and Event Management | |
Continuous Security Monitoring is fully implemented |
Save | |
---|---|
Automated dynamic and static code analysis is performed as part of the delivery cycle | |
External host vulnerability scanning is performed periodically | |
Internal agent-based host vulnerability scanning is performed periodically | |
External network vulnerability scanning is performed periodically | |
Internal network vulnerability scanning is performed periodically |
Research and Development Department is responsible for engineering highly versatile product of high quality, according to Marketing department's requirements.
This section should be considered equally for both infrastructure and application development.
Save | |
---|---|
Variability of used technologies is small | |
Technical debt is monitored and removed periodically |
Save | |
---|---|
Architectural and design decisions are documented along with their context and consequences | |
Architecture is evolutionary, supports incremental change accross multiple dimensions | |
Systems, components, and modules are loosely coupled | |
Replacing a technology with an alternative is theoretically possible | |
Duplications in systems, components, and modules are periodically identified and minimized | |
Services, APIs etc. are treated as products for internal customers | |
Service provider-consumer contracts are documented | |
Service evolution is possible without violating existing provider-consumer contracts | |
Service provider-consumer contracts are automatically tested | |
Service provider-consumer contracts specify quality of service characteristics | |
Save | |
---|---|
There is an easy-to-use catalog of libraries and tools used in the company | |
There is a self-service allowing to provision required tools on developer's machine | |
The process of working with development tools is thoroughly documented |
Save | |
---|---|
Environment management, including provisioning and application deployment, is fully automated | |
There is a self-service process for provisioning of Development and Test environments | |
There is a self-service process for application deployment to Development and Test environments | |
The development process is happening in an isolated environment and is not affecting work of other team members | |
It is possible to provision an isolated datastore including a lightweight version of data used in production | |
It is possible to debug and profile applications in Development and Test environments | |
It is possible to access logs and metrics of applications and infrastructure, running in the Development or Testing environment, at any time |
Save | |
---|---|
There is a centralized Source Control system | |
Code changes are submitted to Source Control as task-level commits | |
Code changes are submitted to Source Control at least once a day | |
Source Control branches or forks are used to isolate work on every task | |
Code submission to Source Control triggers automated build and test processes | |
Only fully tested, production-ready code is integrated into the main branch | |
Code freeze practice does not exist |
Save | |
---|---|
Coding conventions are documented | |
Working code is preferred over comprehensive code documentation | |
Code changes are organized on a task level | |
Existing code is easy to maintain and extend over time | |
Every module or class has responsibility for a single part of the functionality provided by the software | |
Every module, class, or function are open for extension but closed for modification | |
Replacing an object of class A with an object of class B, which is a subclass of A, will not break the program | |
Interfaces are small and defined specifically for interaction between specific suppliers and consumers | |
High-level modules do not depend on low-level modules - both depend on abstractions | |
Abstractions do not depend on details; instead, details depend on abstractions | |
Feature toggles are used to temporarily hide task-level code changes from end-users, without changing the code | |
Feature toggles are categorized by their purpose as "release toggles", "operations toggles", "experiment toggles", and "permission toggles" | |
Feature toggles' category-specific longevity and dynamism are monitored | |
Feature toggles are periodically reviewed and cleaned up | |
There is a centralized system for feature toggles management |
Save | |
---|---|
TBD |
Save | |
---|---|
TBD |
Save | |
---|---|
Application building process is automated | |
It is possible to run application building process with a single command | |
Application building processes are executed on a dedicated machine | |
Every application building process is executed in an isolated workspace | |
Build dependencies are stored in a centralized Package Management System | |
Static Code Analysis is performed automatically during application build process execution | |
Unit tests are performed automatically during application build process execution | |
When application building process execution fails, an alert is sent to a person that triggered the build (and anyone else that is relevant) | |
When application building process execution fails, it is the highest priority to fix it | |
Any application build result can be recreated from Source Control | |
Build report is generated when a building process is done | |
Build reports are accessible at any time |
Save | |
---|---|
There is a centralized Package Management System | |
Build results are automatically versioned and tagged | |
Build results are automatically stored in Package Management System | |
Build results contain information allowing to connect them to a specific build process execution and code revision | |
Build package contains all relevant information required to automatically provision needed infrastructure, set up monitoring and deploy the package |
Save | |
---|---|
Testing processes are using a centralized Monitoring System to determine whether tests
passed or failed, whenever it is possible
see Monitoring
|
|
Functionality of each product and its components is tested | |
Reliability of each product and its components is tested | |
Usability of each product and its components is tested | |
Efficiency of each product and its components is tested | |
Maintainability of each product and its components is tested | |
Test strategy is clear and documented | |
Test plans are documented in details and are accessible at any time | |
Tests are categorized by level and type | |
It is possible to run specific level or type of tests | |
Processes for management and maintenance of test data are standardized | |
Tests code is stored in Source Control | |
It is possible to run tests using a single command | |
Tests are fully automated | |
Automation tools are standardized | |
It is possible to run tests in a dedicated Testing environment | |
It is possible to create and test a version consisting from a group of pending code changes | |
It is possible to run any test on any version of infrastructure, application, and data state | |
Test failures are likely to indicate a real defect | |
Identified defects are fully analyzed | |
Identified defects are immediately assigned to a relevant team member | |
Fixing new defects has the highest priority | |
Cross-company bug tracking service exists | |
Test summary reports are stored and accessible at any time | |
There is an easy-to-use centralized Test Management System | |
There is a self-service allowing to run any test on any Development or Testing environment | |
A dedicated team manages the Test Management System, including defining standards and organization of existing tests | |
Continuous Testing is fully implemented |
Save | |
---|---|
Work integration into the main branch is only possible for fully-tested, production-ready code |
Save | |
---|---|
rspec, serverspec? |
Save | |
---|---|
Peer code reviews are conducted periodically | |
Static Code Analysis is performed automatically during application build process execution | |
During Static Code Analysis process technical debt is measured | |
During Static Code Analysis process coding conventions are verified | |
During Static Code Analysis process bad practices (anti-patterns) are detected | |
During Static Code Analysis process, software metrics such as Code Coverage, Cyclomatic Complexity, Class Coupling and Maintainability Index are calculated | |
During Static Code Analysis process security vulnerabilities are detected | |
Team members are automatically notified about code aberrations detected during Static Code Analysis |
Save | |
---|---|
Unit tests cover the smallest independent and testable parts of the source code which are usually individual methods or OOP classes | |
Unit tests cover at least 80% of the code | |
Mocks and proxies are used for external dependencies | |
Unit integration tests exist | |
Team members are automatically notified about code aberrations detected by the Unit Testing process |
Save | |
---|---|
Integration testing is performed on interfaces between individual modules and defects are detected in individual modules and not in the entire system | |
Integration testing process is used to verify functional, performance, and reliability requirements placed on major design items | |
Integration testing process is running only after Unit testing process finished successfully | |
Both success and error cases are being simulated during the Integration testing | |
Bottom-to-top Integration testing approach is NOT used | |
Team members are automatically notified about defects detected by Integration testing process |
Save | |
---|---|
System testing process is performed to evaluate the system's compliance with specified requirements | |
System testing process is running only after Integration testing process finished successfully | |
System testing test cases are developed to simulate real-life scenarios | |
Creating test cases for System testing does not require knowledge of the inner design of the code or logic | |
Regression testing is performed to validate that newly introduced changes to the system do not introduce new defects | |
Non-regression testing is performed to validate that newly introduced changes to the system have the intended effect | |
Smoke (Sanity) testing is performed to validate that critical functionalities of the system are working as expected | |
Graphical user interface testing is performed to validate that its visual representation and functionality meets specifications | |
Usability testing is performed by observing people trying to use the system for its intended purpose; to validate its usability | |
Performance testing is performed to validate the correctness of system performance in terms of responsiveness and stability under a particular workload | |
Scalability testing is performed to validate the ability to scale-up/down or scale-out/in responding to system's load | |
Compatibility testing is performed to validate application's compatibility with the computing environment, such as hardware and OS | |
Exception handling testing is performed to validate the correct system's behavior during the occurrence of anomalous or exceptional conditions requiring special processing | |
Security testing is performed to reveal flaws in system's security mechanisms that protect data and maintain functionality | |
Accessibility testing is performed to validate the accessibility of the system to all people, regardless of disability type or severity of impairment | |
Team members are automatically notified about defects detected by System testing process |
Save | |
---|---|
Acceptance testing process is used to enable the user, customer or other authorized entity to determine whether or not to accept the system, based on their needs, requirements, and business processes | |
Acceptance testing environments are designed to be identical, or as close as possible, to the anticipated production environment | |
Acceptance testing process is running only after System testing process finished successfully | |
User acceptance tests are specified by business customers or product owners as primary stakeholders | |
User acceptance tests are written in Business Domain-Specific Language (such as Gherkin) | |
There is a manual process for User acceptance tests, performed by stakeholders | |
There is an automated process for User acceptance tests, performed during | |
Operational Acceptance Testing includes testing of component and network failover processes | |
Operational Acceptance Testing includes checking for presence of proper monitoring and alerts, including monitoring of SLA/OLA | |
Operational Acceptance Testing includes testing of data backup and recovery processes | |
Operational Acceptance Testing includes testing of disaster recovery processes | |
Operational Acceptance Testing includes checking of security vulnerabilities | |
Operational Acceptance Testing includes testing of deployment and rollback processes | |
Operational Acceptance Testing includes testing of application installation process (in cases when the application has to be installed on customer's computer) | |
There is a self-service procedure allowing to create separate environments, dedicated for Operational Acceptance Testing | |
Team members are automatically notified about the results of Acceptance testing |
Save | |
---|---|
A centralized tool is used to provision infrastructure, deploy applications and perform data migrations to multiple target environments | |
Infrastructure provisioning, application deployment, and data migrations can be performed as a single, atomic process, separately for each task | |
Quality Gateways ensure that relevant quality checks are passed before deployment to any environment |
Save | |
---|---|
Infrastructure provisioning process is documented | |
Infrastructure provisioning process is performed entirely from code stored in Source Control | |
Infrastructure provisioning process is fully automated | |
It is possible to run Infrastructure provisioning process with a single command | |
Infrastructure is automatically validated after being provisioned | |
In a case of failure, it is possible to rollback and reprovision working infrastructure version | |
In a case of failure, rollback process is triggered automatically | |
There is a self-service process for provisioning of any infrastructure version to Development and Test environments |
Save | |
---|---|
Application deployment process is documented | |
Application deployment process uses only build artifacts stored in a centralized Package Management System | |
Same build artifact is deployed to Test and Production environments | |
Application deployment process is fully automated | |
It is possible to run application deployment process with a single command | |
Applications are automatically validated after being deployed | |
In a case of failure, it is possible to rollback and redeploy working application version | |
In a case of failure, rollback process is triggered automatically | |
There is a self-service process for deployment of any application version to Development and Test environments |
Save | |
---|---|
Data and schema migration process is documented | |
Data and schema migration process is performed entirely from code stored in Source Control | |
Data and schema migration process is fully automated | |
It is possible to run data and schema migration process with a single command | |
Data and schema migrations are automatically validated after being performed | |
In a case of failure, it is possible to rollback to a working data and schema state | |
In a case of failure, rollback process is triggered automatically | |
There is a self-service process for deployment of a lightweight version of data used in Production to Development and Test environments |
Save | |
---|---|
The release process is documented | |
Continuous Operations approach is fully implemented, and the release process does not require any downtime | |
Deployment to production and release to production are defined and performed as two separate processes | |
Feature toggles are accessible through an easy to use interface | |
New features are incrementally released to groups of customers (Canary releases) | |
In a case of failure, it is possible to rollback to a working version of the system | |
In a case of failure, rollback process is triggered automatically | |
Continuous Delivery is fully implemented | |
Continuous Deployment is fully implemented | |
Release notes are auto-generated after each release |
Save | |
---|---|
Top-down metrics catch outages | |
Bottom-up metrics tell you what's wrong | |
Measure process effectiveness of interlinked DevOps processes across the delivery pipeline—such as test-driven development, continuous delivery and response times | |
Determine bottlenecks within the processes ^ | |
Functionality of each product and its components is monitored | |
Reliability of each product and its components is monitored | |
Usability of each product and its components is monitored | |
Efficiency of each product and its components is monitored | |
Maintainability of each product and its components is monitored |
Save | |
---|---|
Expected business value of each delivered feature or improvement is monitored and verified | |
Revenue per User Story is monitored | |
Business transaction monitoring | |
User Interactions |
Save | |
---|---|
Social media monitoring | |
User reviews monitoring | |
Net Promoter Score |
Save | |
---|---|
Production environment is monitored for availability | |
Production environment is monitored for performance |
Save | |
---|---|
TBD |
Save | |
---|---|
Cost of execution is monitored e.g. salaries/time | |
Cost of resources is monitored e.g. AWS | |
Cost of resources is monitored e.g. laptops | |
It is possible to view cost of work at any moment | |
Operating cost - //en.wikipedia.org/wiki/Operating_cost | |
Total Capex and Opex cost reduction compared to other approaches (e.g. ROI case study) |
Save | |
---|---|
TBD |