Managing Agile Teams: October 2014

Friday, 31 October 2014

Why I Am Pushing BDD

Test Driven Development (TDD)

TDD, as usually practiced, focuses on writing automated Unit Tests. The developer writes the test so that it fails until the code performs the functionality that the test expects. First the coder writes the test, then the coder writes functionality that passes the test. Refactoring the code to improve structure and readability is commonly done as a final tidy-up step.The main focus of TDD is on testing the low-level functionality in single modules, classes or methods. Kent Beck wrote Test Driven Development: By Example (ISBN 978-0321146533) in 2002, and is credited with developing and promoting TDD as a practice.

Beck said “Test-Driven Development is a great excuse to think about the problem before you think about the solution”.

TDD is one of the core practices of Agile's XP Technical Practices and is frequently credited with being one of the reasons for the low defect density associated with Agile projects.

Criticism:

1. David Heinemeier Hansson (Agile developer and creator of Ruby on Rails and Instiki Wiki) compares Agile's inflexible requirement for test-first development to religious fundamentalism. Hansson is a strong supporter of Automated Regression Testing, but questions the value of always writing the unit test first. He asserts that test-first development is leading to inelegant software design.

2. James O Coplien (often known as "Cope") is one of the founders of Agile and a leading software theoretician who offers a similar view - suggesting that "Most Unit Testing Is Waste". He points out that the massive computational complexity of testing every unit through every pathway renders the idea of 100% unit test coverage unachievable - thus calling into question the value of attempting such broad testing at the unit level. He is not opposed to the concept of writing tests to prove the functionality of code matches the requirements - he simply questions the value of broad testing at the unit level. His advice is to focus most attention on Functional and Integration Tests. He provides this guidance:

"Low-Risk Tests Have Low (potentially negative) Payoff "
"So one question to ask about every test is: If this test fails, what business requirement is compromised? Most of the time, the answer is, "I don't know." If you don't know the value of the test, then the test theoretically could have zero business value. The test does have a cost: maintenance, computing time, administration, and so forth. That means the test could have net negative value."
"If you cannot tell how a unit test failure contributes to product risk, you should evaluate whether to throw the test away. There are better techniques to attack quality lapses in the absence of formal correctness criteria, such as exploratory testing and Monte Carlo techniques. (Those are great and I view them as being in a category separate from what I am addressing here.) Don’t use unit tests for such validation."
"there are some units and some tests for which there is a clear answer to the business value question. One such set of tests is regression tests; however, those rarely are written at the unit level but rather at the system level."
"if X has business value and you can text X with either a system test or a unit test, use a system test — context is everything."
"Design a test with more care than you design the code."
"Turn most unit tests into assertions."

Response to criticism:

Kent Beck has mounted a spirited defence to these criticisms. Among other things, he responds that TDD is NOT intended to be universally applicable. In conversations with Martin Fowler (http://martinfowler.com/articles/is-tdd-dead/) he further points out that the much of the criticism is about granularity and extent of testing, but this is a decision that should come down to the preference of individual developers:

"TDD puts an evolutionary pressure on a design, people have different preferences for the grain-size of how much is covered by their tests."

Beck also makes the comment (with regard to the level at which tests are written and the utility of maintaining tests) that he would:

"often write a system-y test, write some code to implement it, refactor a bit, and end up throwing away the initial test. Many people freak out at throwing away tests, but you should if they don't buy you anything"

This certainly suggests that Kent Beck has a more pragmatic approach to TDD than some of the Agile evangelists who are promoting the practice and this guidance can perhaps be used in conjunction with the other comments above as a guide when trying to decide what tests to write and at what level the tests should be written.

Acceptance Test Driven Development (ATDD)

ATDD is a collaborative exercise that involves product owners, business analysts, testers, and developers. By defining the tests that need to pass before the functionality will be accepted, ATDD helps to ensure that all project members understand precisely what needs to be implemented. After it is implemented in an automated testing suite, ATDD continues to guarantee that the functionality works - providing regression testing after code or configuration changes.

So, in theory, TDD focuses on the low level and ATDD focuses on the high level, thus suggesting a clear separation. But in practice it simply isn't that clean. Many coders struggle with questions like what to test, where (at what level) tests should "live" and how to demonstrate to customers that a code-based Acceptance Test actually proves that the code is performing as defined in the requirements.

What is the Difference - TDD Vs ATDD

The concepts that drive Test-Driven Development are related to those that drive Acceptance Test-Driven Development (ATDD). Consequently, tests used for TDD can sometimes be compiled into ATDD tests, since each code unit implements a portion of the required functionality. However TDD is primarily a developer technique intended to create a robust unit of code (module, class or method). ATDD has a different goal - it is a communication tool between the customer, developer, and tester intended to ensure that the requirements are well-defined. TDD requires test automation. Although ATDD is usually automated this is not, strictly required.

Limitations of Conventional ATDD as a Communication Tool

ATDD is envisaged to be a collaborative exercise that involves product owners, business analysts, testers, and developers. In theory, these stakeholders work together to design ATDD tests that provide automated code-based traceability from requirements to functionality. But this is difficult to do in practice because the tests are specified and written in a programming language that is usually incomprehensible to non-technical stakeholders.

This opaqueness acts as a blocker to communication and makes the required collaboration with the PO and other non-technical stakeholders difficult for the developer and frustrating for the stakeholders. Can the PO agree that an automated test effectively addresses the required functionality, if the automated test is implemented using code that is opaque to the non-technical PO? How can the PO understand what is being tested if there is no shared and mutually comprehensible language for specifying the coded tests? Or to put it another way - how can we specify code and test behaviour in a way that is comprehensible enough to achieve customer sign-off, yet precise enough to be directly mapped to code? These questions led to a new approach, known as Behaviour Driven Development (BDD).

Behaviour Driven Development (BDD)

BDD was an effort pioneered by Dan North (http://dannorth.net/introducing-bdd/) to clarify questions like

•What should I test?
•How do I derive tests from Acceptance Criteria?
•How can I make automated tests that are easy for my customer to understand?, and
•How will the customer confirm traceability from requirements to tests?
It provides a philosophical framework for approaching everything from eliciting requirements to defining tests.

BDD provides a shared and mutually comprehensible technique for specifying the tests, by linking the tests back to the Acceptance Criteria, and specifying that the Acceptance Criteria should be defined using scenarios in a "Given-When-Then" format (see the section on How To Write Conditions of Satisfaction and Acceptance Criteria). The Given-When-Then format supports the definition of scenarios in terms of:

•preconditions/context (Given),
•input (When)
•expected output (Then)
This style of scenario construction lends itself well to defining tests. The tests are easy to specify and after they are coded the test behaviour is easy to demonstrate to a non-technical PO. Since the test implements a scenario, it can be instantly grasped by the PO.

BDD links Acceptance Criteria to tests in a way that is both customer-focussed and customer-friendly. BDD is usually done using very natural language, and often uses further tools to make it easy for non-technical clients to understand the tests - or even write them. "Slim" is an example of such a tool (or FitSharp in the .NET world). BDD is thus intended to allow much easier collaboration with non-technical stakeholders than TDD.

In line with the benefit-driven approach, BDD frequently changes the order in which the usual formula for story writing is framed. Instead of the usual formula:

As A…. <role>

I Want… <goal/desire>

So That… <receive benefit>

The new BDD order puts the benefit first:

In Order To… <receive benefit>

As A… <role>

I Want… <goal/desire>

But all this doesn't mean that BDD should replace TDD or ATDD. BDD can be used as a philosophy behind both TDD and ATDD, helping the coder decide what to test and which tests should live where.

Even more importantly the BDD "Given, When, Then" formula allows the conversation with the customer (PO or BA) to continue into the domain of acceptance tests - further validating the coder's understanding of the product while simultaneously further validating the customer's confidence that the code does what is expected.

Experiment Driven Development

Experiment Driven Development (EDD) is a relatively new concept in Agile.

The theory is that implementing TDD, ATDD or BDD only works if the assumptions about business benefits (usually made by the PO) are correct! EDD suggests that we test those assumptions with lighweight experiments before we commit a major coding effort to the concept.

EDD suggests that if the Product Owner fails to grasp the big picture, or makes a wrong assumption then TDD and BDD will simply drive optimizations in one area that negatively impact the business as a whole (in line with the predictions of Systems Thinking and Complexity Theory that unexpected outcomes are the norm in Complex Adaptive Systems). The EDD recommendations are in line with the concept of Agile Experiments.

To support the experimental nature of this type of development, EDD introduces a User Story formula for experimental stories:

We believe <this capability>

Will result in <this outcome>

We will know we have succeeded when <we see this measurable sign>

To implement this story formula, the first step is to develop a hypothesis concerning features that might provide business benefits, then construct a story around the hypothesis:

We believe <this capability>

What functionality could you develop to test the hypothesis? Define a test for the product or service that you are intending to build.

Will result in <this outcome>

What is the expected outcome of your experiment? What specific business benefit do you expect to achieve by building this test functionality?

We will know we have succeeded when <we see this measurable sign>

Based on the possible business benefits that the test functionality might achieve (remember that there are likely to be several), what key metrics (qualitative or quantitative) will you measure?

Example
An example of an EDD User Story might be:

We Believe That increasing the size, prominence and graphic nature of images of fire damage and casualties on our page

Will Result In earlier civilian evacuation prior to a fire

We Will Know We Have Succeeded When we see a 5% decrease in unprepared civilians staying in their homes.

Implementing EDD

In line with common Agile Experiment practice, EDD experiments should be lightweight, fast to implement (to gain feedback while it is still relevant) and targeted at areas of high value (either because there is a belief that the area will yield high business benefits, or because the area is believed to hold high-impact risks).

Conclusion and Further References

TDD and BDD focus on how to build the current Product, while EDD focusses on where to take the product next.

Experiment Driven Development

EDD Presentation

Agile Traceability - Moving from Stories to Tests in a Way That Keeps Stakeholders Engaged

Clarifying the requirements of a User Story is a problem of communication. The Development Team requires unambiguous requirements, defined in a form that translates easily into testable code. So developers would prefer requirements that are defined in a technical language such as UML. However the ultimate source of requirements is the Product Owner, who is rarely qualified to deliver requirements in a technically precise language.

The approach to communication outlined below provides one way to iteratively evolve requirements into technically precise and unambiguous specifications.

The Difference Between "Conditions of Satisfaction" and "Acceptance Criteria"

The Conditions of Satisfation (CoS) define the boundaries of a User Story and are usually written on the back of the story. These conditions are rarely detailed enough to fully define what needs to be coded (nor should they be) but will be used to start the conversation.

The outcome of the conversation should be more detailed Acceptance Criteria. The Acceptance Criteria are used to provide a testable way to confirm that a story is completed and working as intended. The CoS defines what the PO expects, while the Acceptance Criteria define how the CoS will be demonstrated.

CoS capture the expectations of the PO - and more importantly the intent behind the story. They exist at various levels (Epic, story and frequently even Iteration).
At the story level, the CoS are unlikely to completely define how the story should be coded. As always in the Agile process, this will be done at the last responsible moment (see "The Right Conversation at The Right Time" below).
Acceptance Criteria define how the PO will confirm that a story has been implemented in line with expectations. Thus the acceptance criteria are a formal and less ambiguous definition of the CoS.

Incorporating Acceptance Criteria into your user stories and tasks has several benefits:

they get the team to think through how a feature or piece of functionality will work from the user’s perspective
they remove ambiguity from requirements
they form the basis of tests that will confirm that a feature or piece of functionality is working and complete

How To Write Conditions of Satisfaction

A Condition of Satisfaction:

Helps clarify objectives and intent.
Applies to a specific product backlog item
Defines a condition that must be true for that product backlog item to be considered done.

For example, a user story such as, "As a user, I want to login so that I can use the site functions," might include these conditions of satisfaction:

User starts logged out
User is logged in only when proper credentials are provided
User has access based on user role
User is locked out after three failed attempts

These conditions are clearly not sufficient to code from, they are intended as a starting point for the conversation. It is rare to find that exhaustive acceptance criteria and tests were defined as part of story creation. The iteration down from Conditions of Satisfaction to Acceptance Criteria and tests usually occurs at or close to the last responsible moment.

Why do we leave this final clarification until so late? Because in Agile we try to avoid wasted effort:

Sometimes we learned things from the last story that will help define Acceptance Criteria for the next story

Sometimes the PO decides not to do the story.
Sometimes the PO decides to change the story
Sometimes we discover in the course of doing a previous story that not all of the product owner’s conditions of satisfaction can be met, and we will need to negotiate or compromise - thus changing the Acceptance Criteria.

Some of the success of the Agile process comes down to minimising wasted effort by delaying effort until it is definitely needed. This waste minimisation is sometimes termed "Maximising the Work Not Done".

The Right Conversation At The Right Time

Exploration of the product owner’s conditions of satisfaction is a highly iterative process.

As an example: During Project Initiation (perhaps at a Discovery Workshop), the team and product owner will collaboratively explore the product owner’s Conditions of Satisfaction. These Conditions will go beyond scope to include schedule, budget, and quality. The team and product owner look for ways to meet all of the conditions of satisfaction. If no feasible approach can be identified, then the conditions of satisfaction must be renegotiated. For example, if schedule is the issue then the product owner may prefer a release in one month that includes one set of user stories rather than a release five months later that includes additional user stories.

Subsequent conversations may occur at various points. This iterative process allows you to have the right conversations at the right time.

The best times for the team to be thinking about the detailed implementation of stories and tasks is just prior to implementing these stories, as this ensures that the understanding is still fresh in mind during coding. Consequently Acceptance Criteria are frequently written in conjunction with the PO during Backlog Refinement. Acceptance Tests may be written at the same time as Acceptance Criteria, or delayed until the moment that the task is actually picked up from the board, at which point the developer seeks to work in conjunction with a tester and PO or BA to define tests based on the Acceptance Criteria.

How To Write Acceptance Criteria

Acceptance Criteria provide a communication tool allowing the PO, Tester and Coder to agree on an unambiguous definition of a task or story. This is a communication tool for crossing the boundary between technical and non-technical, so Acceptance Criteria must be written in simple language, but should be easily translated into automated tests.

Defining a simple scenario for each task is an easily understood way to achieve this goal, as scenarios are usually specific enough to provide the user action (input) and expected outcome (output) in a testable way while being clear enough for non-technical POs to understand.

The Given-When-Then formula is an approach that works well at both generating scenarios and providing a good basis for formulating the tests.

Using the Given-When-Then Formula to Write Acceptance Criteria

Although primarily used as a tool for helping to define Acceptance Criteria tests, the Given-When-Then formula is a template that can be used at every level from Epic down to task:

(Given) some context
(When) some action is carried out
(Then) a particular set of observable consequences should obtain

An example (From Dan North: http://dannorth.net/introducing-bdd/ ):

Given the account is in credit
And the card is valid
And the dispenser contains cash
When the customer requests cash
Then ensure the account is debited
And ensure cash is dispensed
And ensure the card is returned

Notice the use of “and” to connect multiple givens or multiple outcomes in a natural way.

BDD is intended to guide the writing of acceptance tests for a User Story or task. Tools such as JBehave, RSpec or Cucumber encourage use of this template.

If your Given-When-Then scenario is fine-grained enough to allow a definition of input and output, then you are ready to write your test.

How It Works

The system works like this:

1. The initial Conditions of Satisfaction should have been written when the Story was created. The Conditions of Satisfaction are then refined and will probably be elaborated down to Acceptance Criteria during Backlog Refinement, in order to reduce ambiguity and thus enable the estimation of the story. Tests may be defined at this time.

2. During the Stand-Up a developer claims one or more tasks (as always). At that point, if acceptance criteria have not been defined down to the test level then the coder agrees to meet with tester and (ideally) the PO or BA right after the Stand-Up to define the acceptance tests. The tester works with the coder and BA or Product Owner to write an appropriate scenario for each task (or more than one scenario, if needed, but keep the number to the minimum possible). This scenario should be simple enough to convert into the Given-When-Then form. When the scenario(s) have been written and converted into given-when-then test(s), the PO should confirm that this fully illustrates the task and this confirmation should be captured. (Note: Automated tests provide protection against unexpected code interactions or behaviors, however tests are also an overhead. As a guideline: At this stage, the coder should only code the tests required to demonstrate the agreed scenarios. This provides enough tests to provide complete requirements traceability. This traceability is important because if the application passes the automated tests, all requirements are demonstrated to be present and functioning as expected - an automated "Traceability Matrix".)

3. The coder can choose to implement tests first (if the coder is using TDD or BDD) or write the code first and confirm function by implementing the agreed tests. As soon as the coder completes the task and tests the coder calls in the tester and any other interested party, such as Scrum Master and/or PO for a “Desk Check”, in which the code is shown with functionality running and passing the Acceptance Tests. The coder receives immediate feedback concerning whether the code is performing as expected from the tester, and other stakeholders present.

4. The tester may then run further tests on the code (Edge conditions, Usability, Accessibility, etc) and then “sign off”.

5. Final PO sign-off occurs, as always, at the end of the sprint after the review or showcase.

Rant: Do I Have To?

Do I Have To?
“Do I have to do this?” or “Do we have to do this?”

I go into rant mode whenever I hear this question. Normally it will only come from a new team member.

I have a standard answer:

“Errrrr… no. How can I make you? You are a head-kickingly good developer in a high-performance team. So what am I going to do? Fire you? I would need to be crazy. So maybe I should yell at you? Really loud? You know that I am crap at yelling at people…. So if I can’t fire you and I can’t yell at you….. When you think about it, I have no power to compel you to do anything really, do I?"

I wait for that to sink in. Normally they don't answer, so I continue the rant.

"You can say ‘no’ to me any time you like. If you think this is the wrong thing to do, or not the best use of your time, you should definitely say ‘no’. If you drop it, someone else in your team will pick it up. So if you think this is not the right thing to do, don’t do it .”

To which the reply is usually “Errrr, well it’s not that this is the wrong thing to do…. It’s just errr”. And that is the end of it. I never hear the question again. I hear “yes” or “no” but never “Do I have to?”

I call this the “Fuck You David” lesson. I have no authority except whatever authority the team invests in me. The sooner they get their head around that the better.

It is a terrifying realization for the new team member, because suddenly he/she realises that he/she is in charge. The buck stops with them, not with me.

Team members can tell me “no” any time they like. They just mostly chose not to…. In fact, after a rant like that they generally choose not to even ask me if they “have to” :-)

Rant: Want a Happy Team? Put them in Control.

Are you happy?
A few months ago I was a fill-in Scrum Master for an established team. I relished the chance to work direct with a single team - the opportunities are getting fewer with each year that passes.

The team was new to me, but they were experienced. The Stand-up was smooth, finishing in less than 11 minutes. So I did what I often do at the end of a meeting when there are a few minutes to spare.

“Are you happy?” I asked.

Silence.

Then someone said “That’s a bit of a loaded question isn’t it?”

Sadly, this response did not surprise me. Teams aren’t used to anybody asking about their happiness.

This is strange. Happy teams are generally more successful and more productive than unhappy teams – so even if they have not an altruistic or empathic bone in their body, leaders should care about their team's happiness for the productivity advantages alone. So why do they never ask?

“Actually, he means it”, said the only team member who had ever worked with me before.

“I do,” I said. “Seriously, are you guys happy doing this, or does this project suck?”

Stunned silence.

With 3 minutes left in my time-box for this meeting, I concluded that this approach wasn’t going anywhere, so I decided to finish with a monologue that might help them understand.

“OK,” I said, “We all know that projects have ups and downs, but the fact is that you guys are going to do a better job if I can rig things so that you are happy. If you wake up one morning and think ‘I don’t want to go to work.’ I’d like to know, and I’d like to know why." Then I took a stab in the dark - naming the most common reason that Development teams feel unhappy... "If you start to feel like you are just a small cog in a machine that is grinding you into the ground, and you feel like you have no control over the machine… we need to change that. Think of it as being like inversion of control – this team needs to control the machine. My job is to make sure you do control the machine. We need to control everything in our environment that impacts on your ability to do your job. We can’t afford to have you guys feel frustrated for even 1 second. I know this won’t be easy or fast but you tell me what needs to happen to control the machine, we will prioritise your list, plan an approach, and then I’ll do my level best to get it happening for you.”

Disbelieving silence. Then a small voice pointed out “But we are only small cogs in the machine. We don’t control anything, we're just the dev team.”

So the next day we made a list, and I started work. I felt like, having talked the talk, I'd better walk the walk. It felt like jumping off a cliff and hoping that I could figure out how to build a parachute on the way down.

The plan to take control wasn’t simple - but it was quick to implement. We had to change the concept of “team” to incorporate those directly upstream and downstream into our new “extended virtual team”. The plan involved inviting them to our Scrum of Scrums and aligning their vision statement with ours (pretty easy, since they didn’t have a vision statement). It turned out that the managers in charge of the other teams were strongly supportive of better communication and integration and happy to have their team members "get updates" from my team.

Our "updates" quickly turned into discussions and agreements about small workflow changes and process improvements, most of which we just slipped into place immediately. We felt no need to seek approval, so we didn't.

Did it work? Are they a happier team for this? Three weeks later I left the team and went off to find my next problem. Their new Scrum Master tells me that they raised my loss as an impediment. He also tells me that they are holding him responsible for helping them control their environment – something that he is quite happy about :-)

So don’t be afraid to ask your team “Are you happy?” They might need some coaching on what such a "loaded question" means, but once they understand you are serious be prepared for things to change.....and be prepared to jump off that cliff.

Thursday, 30 October 2014

Agile Experiments - Mitigating Risk, Maximising Benefits, and Guiding The Way

Overview

Complexity Theory tells us that the systems that we deal with (corporations, business units, development teams, and the interactions between them) are Complex Adaptive Systems.

In Complex Adaptive Systems, using experiments to guide decisions will frequently outperform using expert opinion.

A CAS is a system in which the components interact dynamically, frequently producing unexpected outcomes. (How often have you checked in and run new code, or sent off an email, or just asked a question, and then been surprised by the explosion that followed?)

These unexpected outcomes make predictions (and thus plans) unreliable, so Complexity Theory suggests that when dealing with a CAS and a substantial number of unknown factors, using experiments to guide decisions will frequently outperform using expert opinion.

Unexpected interactions are not always negative. In addition to reducing our risk profile, experiments allow us to identify unexpected opportunities.

The kind of uncertainties that we deal with exist at many levels, ranging from code behavior, through requirement elicitation, to organisational uncertainty:

How will this new code module interact with the old code?
Is there really a business case underlying this unclear user story?
How will overseas users respond to this UX change?
How will the governance process owner respond to this Agile planning approach?
How will the Operations team respond when they are told that they need to support these new products with Automated Regression Testing frameworks built into the code?

Examples of experiments that might guide decision making include:

Rapid Prototyping
Wireframes
Walkthroughs
Reconnaissance by coding
Buy the head of the Project Management Office (PMO) a coffee and run him through a new process over Afternoon Tea
Implement a new process incrementally in one area of the organisation

Why Experiments?

About Complex Systems:

A complex system has no repeating relationships between cause and effect. For example: Offering the Apple Mac in citrus colours was a big hit for Apple in the late 1990s - it is frequently credited with saving the company - but if they did it again today, would it work again? In complex systems, reliable repeatability simply cannot be expected - the same answer rarely works twice. To quote a lecturer in economics who was asked why his exam papers were identical every year: "Sure, the questions are always the same, but the correct answers change every year!"
A complex system is highly sensitive to small interventions - the famous "Butterfly Effect". For example it is common for a tiny change to a piece of code or a configuration (such as a slight change to an IP number) to completely bring down a large software system.
The success of an action in a Complex System cannot be assumed even if it achieves the intended goal, because unexpected side-effects are common and the down-side of the side-effects may outweigh the upside achieved by the intended goal. (Example: Changing the IP to move a Web server into the same domain as the database server may speed up access to the database - the intended goal - but moving the Web Server behind a firewall may render it unreachable - an unintended but lethal side-effect).

Hence, when dealing with complex systems there are benefits in experimentation. "Safe-fail Probes" are small-scale experiments that approach issues from different angles, in small and safe-to-fail ways. The intent of these probes is to approach issues in small, contained ways to allow emergent possibilities to become more visible. The emphasis is not on avoiding failure, but on monitoring the outcome and allowing ideas that are not useful to fail in small, contained ways, while the ideas that show desirable outcomes can be adopted and used more widely.
See more on Safe-fail probes at: http://cognitive-edge.com/library/methods/safe-to-fail-probes/

The take-away lesson is this:

The outcome of acting on a complex system is rarely predictable. There is no repeating relationships between cause and effect and the emergent behavior of a complex system is frequently unrelated to the actions of the component parts.
A complex system is input-sensitive - small changes may produce large payoffs.
Consequently complex systems sometimes offer big payoffs for small experiments.
The precise outcome of any given experiment is hard to predict, so managing this process requires that the outcome of each experiment is monitored. Successful experiments are expanded, while unsuccessful experiments are terminated.

Specific examples of safe-to-fail experiments might be:

At the developer level - adopting a new design pattern in a contained area of the project
On a larger scale - introducing a new methodology in a small Business Unit

Monitor the outcome and consider what effects the intervention had (both outcome and side-effects). If the undesirable impacts outweigh the desirable effects this experiment is not a "failure", the observed outcome and side-effects will provide a guide towards evolving a path towards a more desirable outcome.

Note that the requirement for safe-to-fail has an important implication in Software Engineering: We need the capability to create test environments that are sufficiently "production like" to yield useful lessons, but sufficiently disconnected from the main codebase to ensure that side-effects of experiments have no impact. This leads to Agile practices such as "mocking" (http://en.wikipedia.org/wiki/Mock_object ).

When To Use Experiments?

As a guide, if either of the following is true, then you are looking at a candidate for experiments:

You have identified an area that you want to exploit, but don't know how to exploit it. Keep the investment small until you have demonstrated that the upside of success is large, then focus on the identified upside. Areas such as automation and the use of coding patterns are examples of commonly identified areas where experiments might yield large benefits.
You have a job to do, but you don't know how to do it and the environment is large and complex. If the area is large and volatile, with a number of unknowns or risks, then the focus is on a broader picture - minimising risk and maximising opportunity by iteratively refining your strategy for reaching your intended destination. Experiments that seek to navigate the unknown should be small probes, run iteratively, with the learning from each probe guiding the direction of the next. For example, when seeking to implement an Organisational Transition, such as an Agile Transformation, run your experiments in cycles:

When navigating the unknown, (e.g. Performing an Organisational Transition), run experiments in cycles, implementing what works and discarding what does not work.

Suiting The Experiment Type To the Situation.

It is often said that Fleming's discovery of penicillin was an accident. This is not entirely true. Fortuitous perhaps, but not a complete accident.

How it happened: Fleming was experimenting with growing bacteria in culture plates. After going on holidays, Fleming came back to his lab to discover that a number of uncovered culture plates had been contaminated by a mould that blew in through a carelessly-opened window. Wherever the mould grew, the bacteria was absent. Fleming recognised the significance at once and collected and labelled the plates, and so began the study that would result in antibiotics - one of the greatest advances in health in the history of humanity.

Accident? Fleming had been looking for agents that would kill bacteria. He was working in a lab that was equiped to detect these agents and most importantly, Fleming was working with the right mindset to recognise an agent when he saw one, even if it wasn't where and when he expected to see it. The contamination of the plates was fortuitous, but the right environment and mindset had been created and the experiments were running.

In deciding what type of experiment to run, ask yourself what upside am I trying to achieve? Define areas of focus where the cost can be small and the upside large.

Then make sure that your experimenters are equipped with not just the right equipment and information, but the right mindset. They should be looking for opportunities to achieve upside, not just attempting to achieve a particular outcome.

Remember that you need to chose between two types of experiment:
1.        Experiments that focus on knowns - seeking to achieve an upside in a known high-value area (in Fleming's case - an agent that would kill bacteria was known to be a likely high-value discovery).
2.        Experiments that seek to navigate the unkown - reducing risk and identifying areas that might yield high-value results. This type of experiment yields two things: Clarity of direction and opportunities for future exploration.

The approach to each is similar, but with some notable differences.
If you are seeking to navigate the unknown:
1.        Gather domain experts.
2.        Clarify what areas might be of interest. Where are your unknowns?
3.        Once you have identified your areas, gather your testing/UX/experimentation experts. What is the best tool to use in each area? (Wireframes to clarify thinking? Rapid prototyping to prove which options might work? Reconnaissance by coding to find out which approach demonstrates acceptable performance? Combinations of some or all? Or something else entirely?)
4.        Put together a team and set them up to do the experiments.
5.        Make the timebox for each experiment clear. This is not open-ended investment.
6.        Make it clear that they are not just there to achieve a particular outcome, they are trying to learn lessons and identify risks and opportunities.
7.        Design your first experiment.
8.        Implement the experiment.
9.        Once the experiment is done, identify the lessons. Have we achieved clarity of direction? Have risks/issues/opportunities for future exploration been identified?
10.        Retain the positive and implement it. Discard the negative and learn from it.
11.        Based on what you have learned, is there a business case for continuing to run experiments?
12.        If there is a business case, design your next experiment, and go to step 8. Repeat as required.

The Differences

Experiments in an area known to be likely to yield upside will tend to focus on identifying rewards in the specific area.
Experiments for navigating the unknown will tend to focus on identifying positive directions for further development or investigation and (equally importantly) directions that are unlikely to reward further investigation..

Further References:
EDD: Experiment Driven Development
David Clarke at LAST Conference 2014: Managing Risk Exploiting Chaos

When is the Last Responsible Moment?

When is the Last Responsible Moment?
Agile frequently refers to the value in delaying decisions until "The Last Responsible Moment" (LRM), but when, exactly, is the Last Responsible Moment? The usual answer is: Late enough to let you get enough information for a correct decision, but early enough to avoid causing delays or added costs.

The Problem is.... We almost never get enough information for a guaranteed "Correct Decision"! Consequently, we sometimes leave decisions too late. When you try to defend your project from potential problems, your decision is similar to the decisions facing a military officer who faces possible hostile action by unknown forces: 1. Decide early, build extensive, and all-encompassing defences at once and then stay vigilant. This exhausts your troops, but at least you have defences, right? But it may turn out you built the wrong defence. If so, your exhausted troops may need to change the defence to deal with the consequent problems as they arise, or you may need to abandon your position - so all that effort was wasted and your troops are forced into a running battle when they are already exhausted. Or there may be no attack at all, so again all the effort was wasted - your troops are exhausted and achieved nothing.
2. Delay the decision and continue gathering information until your sentries believe that they may have an approximate number and position of enemy troops that may be approaching your position. You may be too late to build an effective defence.... and you still don't have enough information to build a defence that you definitely won't need to change. However, although you still can't know for sure that you have reached the Correct Decision and your troops might not have time to implement your decision, at least you have better information, so if the troops get it built in time your defence might be about right, and thus the cost to change your defence to match the precise situation should be less.
3. Delay until the sound of claymores going off drowns out the screams of "Oh God, they're coming through the wire"! So now you have the maximum possible amount of information about enemy composition, size, equipment, tactics, disposition, support, intent and so on. Thus, although your decision will be delivered into a crisis and your troops may have no opportunity to implement your decision, at least there is a decent chance that it is the Correct Decision.
In shaping your decision, ask yourself three questions:
1. Will a delay provide more information?
2. What is the cost of delay?
3. What is the cost of corrective action if you decide early and fix it later?
Delay has a cost, but it might also provide a benefit.

Know the Driver For Delay What might drive the need to delay a decision? Here are some scenarios to consider:

I need to know something, but it is unknowable. Sometimes we cannot know what the "Correct Decision" is until we try. This is particularly true in complex environments, where the entire outcome of a decision may not be predictable in advance. If that is the case, there is no benefit in delay. This is a scenario in which Complexity Theory would advise designing low-cost Agile Experiments (prototyping, wireframe, walkthrough, "reconnaissance by coding", or something similar), which will guide a decision better than waiting and hoping for information that cannot be known.
I need to know something, and it is knowable, but not known yet. Frequently, although we don't yet have the information we need, we get to a point where the cost of delay is higher than the cost of change - which means that the "Last Responsible Moment" becomes "As Soon As Possible"! This is when we need to understand the "cost of delay" vs the "cost of change" - as this drives "The Last Responsible Moment".
I am not in charge of the decision. Sometimes we aren't in charge of the length of time a decision may take. For example, Product Owners are sometimes hard to pin down when we need requirements or priorities, because the PO is trying to get stakeholders to sign-off and the stakeholders aren't sure what the complete, "correct decision" is, so they are frightened to commit to an answer in case they leave out a requirement. Yet we often find that they have a clear understanding of 80% of the answer - their hesitation is around uncertainty over a little added functionality.

If we have developers sitting doing nothing or involved in ineffective make-work while we wait, this fruitless wait for the "correct decision" can represent a very expensive delay! In this case refactoring code to incorporate minor updates from stakeholders is the less expensive option. We develop Iteratively, which allows our stakeholders to make decisions Iteratively! So, in this case, make sure your stakeholders understand the cost of delay versus the cost of change. Explain to them that sometimes we don't need the whole answer - part will do.
If this occurs, it is worth keeping your developers in the loop. Most developers dislike rework, but they dislike idleness and make-work even more! If they understand the situation, they are likely to be understanding about the need to do minor refactoring and rework.

Deciding When To Decide As a guide, if the information you need is:

Unknowable. Then the Last Responsible Moment is now - using experiments to guide your decision.
Knowable, but the cost of delay is greater than the cost of change. Then the Last Responsible Moment is now - but with refinements, improvements and corrections later.
Knowable, and the cost of change is greater than the cost of delay. Then the Last Responsible Moment is whenever you have enough data to maximise the chance of a correct decision.

Deciding when to decide is a decision! Understanding what is driving the delay, and being able to estimate the Cost of Delay Vs the Cost of Change helps you decide which way to go.