March 25, 2007

Common Barriers to Model-Based Automation

If modeling is as simple as the previous blog entry implies, then why isn’t everyone using model-based automated testing?

1. Model-based testing requires a change in thinking.

Most testers have been trained to transform mental models into explicit test scripts – not document behavior in a machine-readable format. However, most testers will find that modeling is actually easier than defining and maintaining explicit test cases for automation.

2. Large models are very difficult to create and maintain.

Small additions to a model can easily trigger exponential growth in the size and complexity. This state explosion usually requires that large models be defined using code instead of tables. The large model problem can be solved through the use of Hierarchical State Machines (HSMs) and state variables.

Most software states have hierarchical relationships in which child states inherit all attributes of the parent states plus have additional attributes that are specific to the child. Hierarchical state machines reduce redundant specification and allow behavior to be modeled in small pieces that can be assembled into the larger system. For example, the following HSM represents the same keyless entry system with less than half as many transitions defined. Actions that are possible from each state are also possible from all the child states. Validation requirements apply to the parent and all child states. This greatly reduces the size and complexity of the model. Large systems can be modeled by merging many small models.






Defining some state information as variables instead of explicitly named states can reduce the state explosion. Sometimes it is easier to define some conditions as state variables instead of specific child states. These state variables can be used to define guarded transitions. Guarded transitions are transactions that are only possible when the specified data condition is met. A requirement that all doors be closed before the example keyless entry system will arm the alarm may be specified as shown below. Without using guarded transitions, adding the difference in behavior based on whether doors are open or closed would require many new states and transitions.






3. The leading test tool vendors do not offer model-based testing tools.

Modeling is not a “best practice” promoted by the tool vendors. Tool vendors often dictate the way that their tools are used. This results in automation practices being defined to fit the tools instead of making the tools fit the desired approach. The good news is that many test automation tools – both commercial and open source – provide enough flexibility to build new frameworks on top of the built-in functionality.

4. Model-based testing looks complicated.

The model-based testing literature often makes modeling look more complicated than necessary. The truth is that modeling does not require expert mathematicians and computer scientists. A relatively simple framework can support complex test generation and execution with less manual work than most other automation methodologies.

March 24, 2007

Finite State Machines

Software behavior can be modeled using Finite State Machines (FSMs). FSMs are composed of states, transitions, and actions. Each state is a possible condition of the modeled system. Transitions are the possible changes in states. Actions are the events that cause state transitions. For example, the following FSM shows the expected behavior of a car keyless entry system.




Images like the above are great for human use, but not machines. State transitions and the actions that trigger them can also be defined in a table format that can be processed by a computer. The above FSM can be represented using the table below.





The requirements for each state can also be defined using tables. The table below contains sample requirements for example keyless entry system.

March 23, 2007

Artificial Intelligence Meets Random Selection

Automated tests can be defined using models instead of scripting specific test steps. Tests can then be randomly generated from those models. The computer can even use the model to recover from many errors that would stop scripted automation. Although the computer cannot question the software like a human tester, the automation tool can report anything it encounters that deviates from the model. Thinking human beings can then adjust the model based on what they learn from the automated test’s results. This is automated model-based testing.

As with any explicit model, a model built for test automation is going to be less complex than the system it represents. It does not need to be complex to be useful. New information can be added to existing models throughout the testing process to improve test coverage.

Intelligent Design

All testing is model-based. Good tests require intelligent design. Testers use mental models of system behavior when creating and executing tests. Scripted tests are created from the designers’ mental models prior to test execution and do not change based on the results. Exploratory testing starts with a pre-conceived mental model that is refined as tests are executed. Whether scripted or exploratory, human testers are capable of applying information they learn during test execution to improve the testing. Computers cannot adjust scripted test cases during execution. What if automated tests could apply behavioral models to generate tests that go where no manual tester has gone before?

March 21, 2007

Why does test automation often fail to deliver?

Software Is Automation

Testers are continually asked to test more in less time. Test automation is often considered to be a solution for testing more in less time. Software is automation. It makes sense to automate the testing of software. However, functional black box test automation rarely does more in less time. Why does automation seldom deliver faster, better, and cheaper testing?


Common Test Automation Pitfalls

The following problems are often encountered during test automation projects.

1. Tests are difficult to maintain and manage.

Many GUI test automation attempts to replace just a single step of the manual testing process: test execution. This can work well when there is value in repeatedly executing the same steps and the application is stable enough to run the same script multiple times. However, in the real word, applications change and doing the same thing over and over again is rarely beneficial. A thinking human tester still needs to create the tests, code the automation, review the results, and update the automation code every time the system changes. The development, maintenance, and management of automated test scripts often requires more time and money than manual test execution.

2. Test results are difficult to understand.

Test results need to be manually reviewed. Automated test results often do not contain enough information to determine what failed and why. Reviewing results takes time. Manually repeating the tests to determine what really happened takes even longer. Insufficient results reporting can easily negate any advantages of automation.

3. Application changes and bugs can prevent tests from completing.

Application changes and bugs that will not stop a human tester can easily stop automated test execution in its tracks due to cascading failures. A single failed step in an automated test can easily prevent execution of all later steps. Updating and restarting tests every time they encounter the unexpected is not an improvement over manual testing.

4. Tests retrace the same steps over and over, and don’t find new bugs.

Scripted automation will repeat the same steps every time it is run. It will not encounter bugs that are not on the pre-cleared scripted path. Sometimes consistency is good, but consistency will not find new bugs. Many testers make the mistake of believing that their automated tests are doing the same things as a good manual tester. Scripted automation will only do what it is coded to do.

Functional black box test automation frequently requires more manual work to poorly do less than a human tester. This is the result of attempting to duplicate manual testing with automation. Attempting to duplicate manual testing processes not only fails to reduce costs or improve coverage; it creates additional manual work.

Useful software rarely mirrors the manual process that it replaces. Word processors are much more than virtual typewriters. Spreadsheet programs are more than virtual calculators. The process needs to change to take advantage of the strengths of the machine. People and machines have different strengths. A machine cannot think. A human tester is unlikely to be happy running tests all night or performing tedious calculations. Most software is built to assist human users, not replace them. Test automation should assist human testers, not attempt to replace them.


Rules of Test Automation

Arguing that computers cannot think like a human being, James Bach proposed the following test automation rules in a blog entry titled Manual Tests Cannot Be Automated.
Rule #1
A good manual test cannot be automated.

Rule #1B
If you can truly automate a manual test, it couldn’t have been a good manual
test.

Rule #1C
If you have a great automated test, it’s not the same as the manual test that you believe you were automating.


A great automated test is one that assists testers by doing what is not easily done manually without creating more manual work than it replaces. Great automation goes beyond test execution by assisting with test generation and providing useful information to manual testers. Combining Model-Based Testing (MBT) with a tester-friendly automation framework is one way to improve the effectiveness of test automation.

March 18, 2007

SQuAD 2007 Conference Presentation

Click here to download my SQuAD conference presentation slides.

Please ask questions using the blog's comment feature or email them to me at ben@qualityfrog.com.

Ben

Software is more reliable than people!

Software is 100% reliable. It does not break. It does not wear out.

We can depend on software to exactly what it is coded to do every time it does it.

This is why software quality stinks!

The consistent repeatability in software is both a blessing and a curse. We can depend on software processing the same data in the same way every time. We can also rely on software to not do what it does not do every time.

The repeatability of software is both its greatest strength and weakness. A simple mistake in design or implementation will forever repeat itself each time a computer program runs.

Over 50 years ago, three mathematicians wrote:


Those who regularly code for fast electronic computers will have learned from bitter experience that a large fraction of the time spent in preparing calculations for the machine is taken up in removing the blunders that have been made in drawing up the programme. With the aide of common sense and checking subroutines the majority of mistakes are quickly found and rectified. Some errors, however, are sufficiently obscure to escape detection for a surprisingly long time.

[R.A. Brooker, S. Gill, D.J. Wheeler, "The Adventures of a Blunder", Mathematical Tables and Other Aids to Computation, 1952]

The source of the problem is people. We are not reliable. We make mistakes. Software amplifies and repeats our successes and our mistakes equally well.

When our software encounters the unexpected, errors occur. As developers and testers, we need to expect the unexpected. Think about how requirements might be misunderstood and clarify any ambiguity. Think about how users might misuse a system -- either accidentally or intentionally -- and ensure that the system can handle that user behavior.

The software we use today is exponentially more complex than the software being developed 50 years ago. There have got to be more opportunities for blunders in today's software than there were in software development half a century ago. And with that complexity some errors become even more obscure and are more likely to escape detection by developers and testers. Users, however, seem to easily encounter these errors.


A common mistake that people make when trying to design something completely foolproof is to underestimate the ingenuity of complete fools.

-- Douglas Adams

March 13, 2007

Expecting the unexpected. Part 2

How can we create automation that can deal with the unexpected?
The first step is to create test automation that "knows" what to expect. Most GUI test automation is built by telling the computer what to do instead of what to expect. Model-Based Automated Testing goes beyond giving the computer specific test steps to execute.

Model-Based Testing is testing based on behavioral models instead of specific test steps. Manual testers design and execute tests based on their mental models of a system's expected behavior.

Automated tests can also be defined using models instead of scripting specific test steps. Tests can then be randomly generated from those models -- by the computer instead of a manual tester. The computer can even recover from many errors that would stop traditional test automation because it knows how the system is expected to behave. And by knowing how it is expected to behave, it can detect unexpected behavior. Unexpected does not necessarily mean wrong behavior. The behavior could be wrong or it could be something that was not included in the model. The computer can report the unexpected behavior to human testers for investigation and future updates to the model.

For example, if one path to functionality to be tested fails, the MBT execution engine can attempt to access that functionality by another path defined in the model.

Of course, there will always be some cascading failures that stop both automated and manual tests. MBT inherently provides better error handling than scripted test automation.

February 26, 2007

Expecting the unexpected. Part 1

One of the expectations for GUI test automation is unattended running of tests. However, this is often difficult to accomplish. Unexpected application behavior can stop an automated test in its tracks. Manual intervention is then required to run the script. Some automation tools offer run-time options to help the user prod the test along. Other tools require that the script or system under test be fixed before test execution can continue. The process of running a partial test, fixing the script (or waiting for an application fix), and then running another partial test to only find another script-stopping change can be time consuming. This process often takes longer than manual testing.

The problem is that scripted automation cannot adjust to application issues like a thinking manual tester. The automation script can only do what the scripter told it to expect. Some automation tools offer complex exception handling features that allow users to define expected unexpected behavior. There lies the problem: someone has to expect and code for the unexpected. There will always be unexpected unexpected behavior.

How can we create automation that can deal with the unexpected?

February 13, 2007

Top ten web application security flaws

The topic of tonight's SQuAD meeting was software security. Mike Walters presentation, "Integration of Security into the SDLC", highlighted the need to implement and validate security throughout the software development lifecycle. Mike mentioned that security is moving from the realm of "non-functional" testing to functional testing. Security has become an important functional requirement. Mike stressed the need to define security requirements at the start; get developer buy-in; and provide developers with the tools and training to build secure software. The risks of poor security are often too great to ignore.

Mike recommended the OWASP Top Ten Project as a great starter list of web application security threats to consider.
    • A1 Unvalidated Input. Information from web requests is not validated before being used by a web application. Attackers can use these flaws to attack backend components through a web application.
    • A2 Broken Access Control. Restrictions on what authenticated users are allowed to do are not properly enforced. Attackers can exploit these flaws to access other users' accounts, view sensitive files, or use unauthorized functions.
    • A3 Broken Authentication and Session Management. Account credentials and session
      tokens are not properly protected. Attackers that can compromise passwords, keys, session cookies, or other tokens can defeat authentication restrictions and assume other users' identities.
    • A4 Cross Site Scripting. The web application can be used as a mechanism to transport an attack to an end user's browser. A successful attack can disclose the end user's session token, attack the local machine, or spoof content to fool the user.
    • A5 Buffer Overflow. Web application components in some languages that do not properly validate input can be crashed and, in some cases, used to take control of a process. These components can include CGI, libraries, drivers, and web application server components.
    • A6 Injection Flaws. Web applications pass parameters when they access external systems or the local operating system. If an attacker can embed malicious commands in these parameters, the external system may execute those commands on behalf of the web application.
    • A7 Improper Error Handling. Error conditions that occur during normal operation are not handled properly. If an attacker can cause errors to occur that the web application does not handle, they can gain detailed system information, deny service, cause security mechanisms to fail, or crash the server.
    • A8 Insecure Storage. Web applications frequently use cryptographic functions to protect information and credentials. These functions and the code to integrate them have proven difficult to code properly, frequently resulting in weak protection.
    • A9 Application Denial of Service. Attackers can consume web application resources to a point where other legitimate users can no longer access or use the application. Attackers can also lock users out of their accounts or even cause the entire application to fail.
    • A10 Insecure Configuration Management. Having a strong server configuration standard is critical to a secure web application. These servers have many configuration options that affect security and are not secure out of the box.

The next time you are involved in designing, coding, or testing a web application: consider these things.

February 12, 2007

Best Practices Aren’t

The first two of the Seven Basic Principles of the Context-Driven School of software testing are:

1. The value of any practice depends on its context.
2. There are good practices in context, but there are no best practices.


As a former quality school gatekeeper, I understand the value of standards – in both products and processes. However, I am concerned by the current “best practices” trends in software development and testing. The rigidity that we demand in product standards can hurt in process standards. Even the CMM (which is often viewed as a rigid process) has “Optimizing” as the highest level of maturity. A mature process includes continuous learning and adjustment of the process. No process should lock us into one way of doing anything.

Nearly 100 years ago, the industrial efficiency pioneer Frederick Taylor wrote “among the various methods and implements used in each element of each trade there is always one method and one implement which is quicker and better than any of the rest”.

I do not disagree that there may be a best practice for a specific task in a specific context. Taylor broke down existing methods and implements (tools) into small pieces and scientifically evaluated them in search of areas for improvement. The problem is that today's best practices are often applied as one-size-fits-all processes. The best practice for one situation is not necessarily the best for all other contexts. And a "best practice" today may no longer be the best practice tomorrow. This is actually the opposite of what Taylor did. Consultants and tool vendors have discovered that there is money to be made taking "best practices" out of one context and applying to all other contexts. It is harder, and likely less profitable, for the "experts" to seek out the best practices for a specific context. Taylor sought out and applied best practices to small elements. Many of today's "best practices" are applied at a universal level.

I am amazed by what appears to be widespread acceptance of “best practices” by software testers. As testers, it is our job to question. We make a living questioning software. We need to continually do the same for practices. Test your practices.

When presented with a best practice, consider contexts in which the practice is not the best. The broader the scope of the best practice, the more situations it is unlikely to fit. Don’t limit your toolbox to a single practice or set of practices. Be flexible enough to adjust your processes as the context demands. Treat process as an example and apply it where it fits and be willing to deviate from the process -- or even apply an entirely different process -- if it does not fit the context.

No process should replace human intelligence. Let process guide you when it applies. Don’t let a process make decisions for you.

Seek out continuous improvement. Don't let process become a rut.

Process is best used as a map; not as an auto-pilot.

February 8, 2007

When is a bug a feature?


Ever find a bug that "can't" be fixed?

I am not referring to fixes prevented by technical or schedule limitations. I am referring to bugs that have become expected features.

Here are two...

Experience #1
I once found a bug in communications software that would cause most of the systems likely to be on the other end of a connection to crash. I don't mean the failure of a single process. This was a memory allocation issue that could take down entire systems that were essential to their owners. An argument could be made that the bug was really in all those other systems and not in the system that I was testing. However, most of the large number of systems that made up the pre-existing installed base could not create the condition that would make another system fail. Due to the large number of systems (from a variety of vendors) it was determined that the condition that caused the failure should be prevented by new systems (and future releases of existing systems) instead of immediately fixing all the old systems. When I discovered the problem on the new system, the developers were willing and able to make a quick fix but the business said no. The reason? The manuals had already been printed and the application could not be changed in any way that changed the documented user experience.

Experience #2
After being newly assigned to test a product that had been in "maintenance" mode for many years, I discovered and reported numerous bugs. There were also new developers assigned to this product. The developers and I were allowed to work on these long-standing defects because the new developers needed a chance to familiarize themselves with the code before working on upcoming enhancements. One of the bugs we fixed was a yes/no prompt that required that users select "no" when they meant "yes", and "yes" when they meant "no". To both me and the new development team, this was a major problem. However, after shipping the "new and improved" release, we received requests from a customer that the yes/no prompt be put back the way it was. The reason? The customer had created their own documentation and training for their users. The customer was teaching their users that "yes" means "no" and "no" means "yes". We had to back out this change to keep the customer happy.

Some lessons I learned from these experiences are:
1) There are often bigger issues involved in software development than removing all the bugs.
2) Users learn to work around bugs and treat them as features. This can create a situation in which the fix to a bug becomes a new bug. Consult users before making changes that impact their workflow -- especially users that write big checks.
3) Delaying a fix for a bug that impacts how users behave may prevent the bug from ever being fixed. Had the yes/no issue been fixed soon after it was first introduced, the customer would have been happy with the fix. You may end up needing to manage two sets of code: one for customers that want the bug fixed and one for customers that want the bad behavior to stay.
4) Respectfully ask questions when direction doesn't make sense. Work with stakeholders to come up with creative solutions. In the case of the pre-printed documentation, development was able to come up with a creative solution that did not impact the user interface or documentation.

What "non-fixable" bugs have you encountered? Was a solution found?

February 7, 2007

People, Monkeys, and Models

Methods I have used for automating “black box” software testing…


I have approached test automation in a number of different ways over the past 15 years. Some have worked well and others have not. Most have worked when applied in the appropriate context. Many would be inappropriate for contexts other than that in which they were successful.

Below is a list of methods I’ve tried in the general order that I first implemented them.

Notice that I did not start out with the record-playback test automation that is demonstrated by tool vendors. The first test automation tool I used professionally was the DOS version of Word Perfect. (Yes, a Word Processor as a test tool. Right now, Excel is probably the the one tool I find most useful.) Word Perfect had a great macro language that could be used for all kinds of automated data manipulation. I then moved to Pascal and C compilers. I even used a pre-HTML hyper-link system called First Class to create front ends for integrated computer-assisted testing systems.

I had been automating tests for many years before I saw my first commercial GUI test automation tool. My first reaction to such tools was something like: "Cool. A scripting language that can easily interact with the user interfaces of other programs."

I have approached test automation as software development since the beginning. I've seen (and helped recover from) a number of failed test automation efforts that were implemented using the guidelines (dare I say "Best Practices"?) of the tools' vendors. I had successfully implemented model-based testing solutions before I knew of keyword-driven testing (as a package by that name). I am currently using model-based test automation for most GUI test automation: including release acceptance and regression testing. I also use computer-assisted testing tools help generate test data and model applications for MBT.

I've rambled on long enough. Here's my list of methods I've applied in automating "black box" software testing. What methods have worked for you?

Computer-assisted Testing
· How It Works
: Manual testers use software tools to assist them with testing testing. Specific tasks in the manual testing process are automated to improve consistency or speed.
· Pros: Tedious or difficult tasks can be given to the computer while a thinking human being is engaged throughout most of the process. A little coding effort greatly benefits testers. A thinking human being is involved throughout most of the testing process.
· Cons: A human being is involved throughout most of the testing process.

Static Scripted Testing
· How It Works: The test system steps through an application in a pre-defined order, validating a small number of pre-defined requirements. Every time a static test is repeated, it performs the same actions in the same order. This is the type of test created using the record and playback features in most test automation tools.
· Pros: Tests are easy to create for specific features and to retest known problems. Non-programmers can usually record and replay manual testing steps.
· Cons: Specific test cases need to be developed, automated, and maintained. Regular maintenance is required because most automated test tools are not able to adjust for minor application changes that may not even be noticed by a human tester. Test scripts can quickly become complex and may even require a complete redesigned each time an application changes. Tests only retrace steps that have already been performed manually. Tests may miss problems that are only evident when actions are taken (or not taken) in a specific order. Recovery from failure can be difficult: a single failure can easily prevent testing of other parts of the application under test.

Wild (or Unmanaged) Monkey Testing
· How It Works:
The automated test system simulates a monkey banging on the keyboard by randomly generating input (key-presses; and mouse moves, clicks, drags, and drops) without knowledge of available input options. Activity is logged, and major malfunctions such as program crashes, system crashes, and server/page not found errors are detected and reported.
· Pros: Tests are easy to create, require little maintenance, and given time, can stumble into major defects that may be missed following pre-defined test procedures.
· Cons: The monkey is not able to detect whether or not the software is functioning properly. It can only detect major malfunctions. Reviewing logs to determine just what the monkey did to stumble into a defect can be time consuming.

Trained (or Managed) Monkey Testing
· How It Works: The automated test system detects available options displayed to the user and randomly enters data and presses buttons that apply to the detected state of the application. · Pros: Tests are relatively easy to create, require little maintenance, and easily find catastrophic software problems. May find errors more quickly than an unsupervised monkey test.
· Cons: Although a trained monkey is somewhat selective in performing actions, it also knows nothing (or very little) about the expected behavior of the application and can only detect defects that result in major application failures.

Tandem Monkey Testing
· How It Works:
The automated test system performs trained monkey tests, in tandem, in two versions of an application: one performing an action after the other. The test tool compares the results of each action and reports differences.
· Pros: Specific test cases are not required. Tests are relatively easy to create, require little maintenance, and easily identify differences between two versions of an application.
· Cons: Manual review of differences can be time consuming. Due to the requirement of running two versions of a application at the same time, this type of testing is usually only suited for testing through web browsers and terminal emulators. Both versions of the application under test must be using the same data – unless the data is the subject of the test.

Data-Reading Scripted Testing
· How It Works: The test system steps through an application using pre-defined procedures with a variety of pre-defined input data. Each time an action is executed, the same procedures are followed; however, the input data changes.
· Pros: Tests are easy to create for specific features and to retest known problems. Recorded manual tests can be parameterized to create data-reading static tests. Performing the same test with a variety of input data can identify data-related defects that may be missed by tests that always use the same data.
· Cons: All the development and maintenance problems associated with pure static scripted tests still exist with most data-reading tests.


Model-Based Testing
· How It Works:
Model-based testing is an approach in which the behavior of an application is described in terms of actions that change the state of the system. The test system can then dynamically create test cases by traversing the model and comparing results of each action to the action’s expected result state.
· Pros: Relatively easy to create and maintain. Models can be as simple or complex as desired. Models can be easily expanded to test additional functionality. There is no need to create specific test cases because the test system can generate endless tests from what is described in the model. Maintaining a model is usually easier than managing test cases (especially when an application changes often). Machine-generated “exploratory” testing is likely to find software defects that will be missed by traditional automation that simply repeats steps that have already been performed manually. Human testers can focus on bigger issues that require an intelligent thinker during execution. Model-based automation can also provide information to human testers to help direct manual testing.
· Cons: It requires a change in thinking. This is not how we used to creating tests. Model-based test automation tools are not readily available.

Keyword-Driven Testing
· How It Works:
Test design and implementation are separated. Use case components are assigned keywords. Keywords are linked to create tests procedures. Small components are automated for each keyword process.
· Pros: Automation maintenance is simplified. Coding skills are not required to create tests from existing components. Small reusable components are easier to manage than long recorded scripts.
· Cons: Test cases still need to be defined. Managing the process can become as time consuming as automating with static scripts. Tools to manage the process are expensive. Cascading bugs can stop automation in its tracks. The same steps are repeated each time a test is executed. (Repeatability is not all its cracked up to be.)

February 6, 2007

Slogans are models.

Harry Robinson posted an answer to inquiries about the Google Testing Blog's slogan: "Life is too short for manual testing." Some were concerned that the slogan implied that Google does not value manual and exploratory testing. I too had such concerns.

Harry pointed out that the slogan is just a slogan and that life really is too short to try all the combinations that might expose important bugs.

This got me to thinking about slogans as models. A slogan is really a model of an idea. It is not complete. It is simpler than the thing it describes.

Consider the following advertising slogans:
  • "The ultimate driving machine"
  • "When it absolutely, positively has to be there overnight."
  • "Finger lickin' good."
  • "Let your fingers do the walking."
  • "Reach out and touch someone."
  • "The quicker picker-upper."
  • "Have if your way."
  • "It's everywhere you want to be."
  • "Betcha can't eat just one."
These slogans bring to mind attributes about the companies and their products that are not an explicit part of the slogan. I don't even have to mention the companies or their products. This is your mind accessing your mental model of the company and products that the model represents.

In addition to the more detailed model invoked in your mind, it should not be difficult to find faults with these slogans. The slogans are incomplete; yet they are not useless.

Slogans demonstrate both the usefulness and potential complexity of models. A model does not need to be complete to be useful.

So, how does this apply to software testing ... and test automation?

When we develop test cases or perform exploratory testing we are implementing our mental models. When we execute tests, we (hopefully) learn more about the system under test and update our mental models.

In the same way, explicit models used for model-based test automation can be refined after each test execution. There is no need to model all the possible details before the first test run. Running tests based on incomplete models can provide valuable information about your test subject. It can validate or disprove your assumptions. Results from an incomplete model can help lead you to other useful tests -- both manual and automated.

Investigate using Hierarchical State Machines to simplify model definition and maintenance.

Build your models one step at a time.