Although Brian Marick was not the originator of the concept, I first heard about Soap Opera Tests from Brian. Rather than a test covering a single, simple scenario, instead exaggerate and complicate the scenario to push the system to see where the failures can occur. This gets around the problem that is often seen in Agile projects where the team tries to simplify the problem domain by ignoring what could be considered to be edge-cases and just addressing the simple scenarios.
The lens of a Soap Opera can be useful to review the test suite for an application that goes beyond the simplistic code coverage that is often reported from unit tests and component tests within a deployment pipeline
How many tests (outside of unit tests) have a trivial sequence of setup, do action, check result, teardown, (or to use the Agile terms, how many tests are of the form Given, When, Then) rather than a connected sequence of transactions that represents a complex scenario?
For parameterized tests, how many are truly distinct tests rather than just equivalent values that exercise the exact same code path?
Are the System tests already covered by the Component level tests already implemented by the developers? (Typically the developer written tests may consider some possible failures, but miss others)
Do the System tests touch multiple parts of the architecture as part of a test scenario? (This is where a Soap Opera mindset helps, making sure that the test addresses what happens at team and component boundaries.)
Do the System tests address the full scope of the system and cover all interacting systems? (A common failing is that of not testing that the data replicated to the associated data lake/swamp/warehouse accurately represents the system data.)
Overall whenever evaluating a test, it is useful to know what risk is it addressing. Ideally any descriptive text included in the automated test case should include information about the motivation for the test, why it is important and the consequences of skipping the test. My take is that System tests should not be just repeating what can already be done by unit and component level tests (e.g. view and controller tests in Phoenix Testing terminology), they have to go beyond those simple scenarios and probe the interfaces between the various components.
Basically all tests have to answer the economic question as to what is the value of this test case?
CUPID is Dan’s response to the SOLID principles and back story. Rather than another set of principles, Dan instead chose to focus on the properties of the software.
Composable – code that works well with others and does not have too many external dependencies that make it harder to use in another context, ideally with Intention Revealing terminology
Unix philosophy – related to the composability property, does one thing well and works well with others to build a larger solution
Predictable – or as the saying goes, “does what it says on the tin.” Dan calls this a generalization of Testability, it should behave as expected, be deterministic and observable
Idiomatic – naturally fits in with the way code is written in the implementation language, so for example in Python, rather than open, write and then close a text file, the natural way to write this is as below, where Python automatically handles the closing of the file
with open("file.txt", 'w') as textfile:
textfile.write("hello world!")
Domain based – uses words and language in a way that would be familiar to practitioners in that domain.
When working with interpreted languages like Ruby, Elixir and Python it is great to use the REPL to discover the capabilities of the various variables that you are dealing with.
Ruby uses irb, Elixir uses irb and to be different Python jumps directly into the interactive prompt using python. In each of these you have the full power of the language to use whatever libraries you have installed by just typing code at the relevant prompt. So at the python prompt you could do the following to see how Playwright interacts with the browser - using code borrowed from an earlier post.
from playwright.sync_api import sync_playwright
playwright = sync_playwright().start()
browser = playwright.chromium.launch()
page = browser.new_page()
page.goto("https://www.selenium.dev/")
page.click("#main_navbar :text('blog')")
title = page.title()
print(title)
The nice thing with each of these REPLs is that they allow you to see the type of the object and the associated attributes and methods, and hence get a better understanding of the library by trying things out and getting immediate success or failure - with an associated error message and stack dump, immediately followed by the REPL prompt for you to try again. Amusingly this even works for overly complex APIs like the Amazon Boto3 python library that you need to interact with the AWS services.
Normally I avoid any hint of political comment, but this just hit the sweet spot of asking who in influencing our wetware: how do we decide what to care about and what to argue about.
When using the playwright codegen utility, it provides a nice preview of the available selector when hovering the mouse over any part of the web page. When tried with the Phoenix Liveview default application, it can be started with the command
> playwright codegen http://localhost:4000/
and after navigating to the LiveDashboard, the selector for the refresh speed shows up in the chromium browser
It also does a good job of generating some sample code that can then be copied into a pytest test case for future reuse
# Click text=Ports
# with page.expect_navigation(url="http://localhost:4000/dashboard/ports"):
with page.expect_navigation():
page.click("text=Ports")
# Select 2
page.select_option("select[name=\"refresh\"]", "2")
Note that it will delay the script with expect_navigation until the Ports page is displayed - although it is not waiting for a specific url unlike the commented out part of the code.
Good news out of the UK, proposal for a new Automated Vehicle Act that transfers liability from the person in the driving seat to the “Authorised Self-Driving Entity” (ADSE) (aka the Manufacturer). The person in the driving seat becomes the “user-in-charge” (UIC), responsible for the condition of the vehicle, the ADSE is responsible for “the way the vehicle drives, ranging from dangerous or careless driving, to exceeding the speed limit or running a red light”.
Proposal also covers “no user-in-charge” (NUIC) where any occupants are merely passengers. “Responsibilities for overseeing the
journey will be undertaken by an organisation, a licensed NUIC operator.”
A key new part of this idea is the prevention of misleading marketing
The distinction between driver assistance and self-driving is crucial. Yet many drivers are currently confused about where the boundary lies. This can be dangerous. This problem is aggravated if marketing gives drivers the misleading impression that they do not need to monitor the road while driving - even though the technology is not good enough to be self-driving.
An ASDE is the vehicle manufacturer or software developer who puts an AV forward for authorisation. Our proposals provide some flexibility over the identity of the ASDE: it may be a vehicle manufacturer, or a software developer, or a partnership between the two. However, the ASDE must show that it was closely involved in assessing the safety of the vehicle. It must also be of good repute and have sufficient funds to respond to regulatory action (including organising a recall).
The onus will be on the ASDE to show that the vehicle meets the tests for authorisation. As a minimum, the ASDE would be expected to present evidence of approval, a safety case and an equality impact assessment.
Obviously the devil will be in the details, but this is a massive change in the way that software is covered by the law. Most software falls under the category whereby the developers basically disclaim all responsibility for the operation of the software, but this proposed framework changes that. Even if the vehicle requests a handover to the person in the driving seat, the ASDE remains responsible if the Automated Driving System caused the issue, their example being
While in self-driving mode, an automated vehicle turns into a one-way street in the wrong direction. The user-in-charge takes over but is unable to avoid a collision. Alternatively, no collision takes place, but in the moment the user-in-charge takes over, they are driving in the wrong direction and may be guilty of an offence on that basis.
Terry Pratchett’s Discworld series has many delightful sayings and ideas, and it is nice to hear that the “Sam Vimes ‘Boots’ theory of socio-economic unfairness” has inspired some action in Roundworld, a new price index that one that
will document the disappearance of the budget lines and the insidiously creeping prices of the most basic versions of essential items at the supermarket
“The reason that the rich were so rich, Vimes reasoned, was because they managed to spend less money,” wrote Pratchett. “Take boots, for example. He earned thirty-eight dollars a month plus allowances. A really good pair of leather boots cost fifty dollars. But an affordable pair of boots, which were sort of okay for a season or two and then leaked like hell when the cardboard gave out, cost about ten dollars. Those were the kind of boots Vimes always bought, and wore until the soles were so thin that he could tell where he was in Ankh-Morpork on a foggy night by the feel of the cobbles. But the thing was that good boots lasted for years and years. A man who could afford fifty dollars had a pair of boots that’d still be keeping his feet dry in ten years’ time, while a poor man who could only afford cheap boots would have spent a hundred dollars on boots in the same time and would still have wet feet.”
Why is it that collectively we seem to be fawning over tech visionaries, celebrities and politicians who have no connection with reality?
A tweet from the end of 2018 “You can summon your Tesla from your phone. Only short distances today, but in a few years summon will work from across the continent”. – Were there any plans for unattended recharging? How was this supposed to work?
Non-fungible Tokens (NFTs) – finally we have a use for the blockchain that is not a ponzi scheme – Really?
Microservices and the Cloud – Useful at massive scale in some organizations and contexts, seem to be being adopted cargo cult fashion, by everyone, even for small scale applications. Luckily not everyone buys into this, You Don’t need the cloud.
Single Page Applications (SPA) – in context, occasionally useful, but for many websites the overall effect is to make the information on the webpage slower to load and harder to bookmark
If the covid pandemic has taught us anything, it is that a lot of people with an audience are absolutely clueless about the topics on which they are pontificating. On Bullshit, a book published all the way back in 2005, could usefully be required reading for our current age.
I heard about this project in the early years of the Agile methodologies, another case of planning for the best case and then failing to realize that their reality check bounced. After an overrun of one or two years you would have thought that there would need to be a radical reappraisal of the approach.
Since then I have seen multiple projects which were supposedly agile that seem to have not heard of the first principle, early and continuous delivery
of valuable software. One failure mode is to spend a lot of time in the project initiation activities, or gathering and documenting all requirements before starting to deliver software. Another, probably worse failure mode is to develop a framework for delivering the application, with the thought that this will make the eventual delivery of the application faster.
My take is that it is OK to invest in framework development, but not to do it as part of delivering business value in a project. The problem is similar to what used to happen in the early days of OO projects. The team would spend too much time building a framework that in the end turned out not to help the overall project, but added a lot of delivery risk.
If a company has money and people to burn, then it can make sense to either extract a framework from an existing application or speculatively create a framework. But this must be treated as an investment and must not be on the critical path for any real project until it has been proven out.
For normal projects, it is OK to spend one or two iterations at the start to build up some infrastructure and components for use, but after three iterations the project should be delivering real features to the user community. If you manage to go ten iterations without delivering customer value, the project is not agile, even if it is doing some of the agile ceremonies.
Humans are not very good at doing this. As the last two years have proved, lots of people have hoped for the best and then planned for that best case. This has not turned out to be a very good approach.
In the good times, planning on everything turning out reasonably well and running lean with just in time deliveries can result in good return on investment and higher profits. Typically there are enough buffers in the system that small interruptions can be dealt with, so a week or two delay in shipping due to storms do not cause the system to break down.
Bigger interruptions however can cause major problems, but ideally the effect should be localized. Earthquakes obviously have a major local impact, but unless it hits a monopoly provider location, the impact should not be global. Obviously in an era of offshoring to cheaper locations, there has been a lot of concentration of industry, so the vulnerability to regional disruption is worse.
The downside to the optimization however is the lack of slack in the system. This is when planning for the best case causes problems. Hoping that a pandemic is going to fade away quickly is OK, but making plans on that assumption is not sensible based on the history to date. By now policy makers should be thinking and talking about how many more waves could occur, rather than scrambling to contain the current wave. How do we get the number of active cases in the population low enough that we can control the spread in the long term?
One effect that is starting to be seen is the effect on staffing. How organizations cope when 5% of the staff are off at any one time is a relatively solved problem, but when 25% to 35% are off there are no ready made answers.
To use pikchr you really need to downlolad it, play with the examples and read the documentation, but an example pikchr source file (input.pikchr) such as shown below is a great exemplar of the Diagrams as Code paradigm.
box color blue "A" fit;
circle color red "C"
arrow <-> "double headed" "arrow" width 200% chop;
circle "Small" fit;
box "long text will expand the box" fit;
By the use of the command pikchr --svg-only input.pikchr > image.svg gives the image in svg format, ideal for just pasting into a markdown document to display on a web page.
pikchr works well for diagrams that are otherwise awkward to do with the normal visual drawing tools. Brought to you by the maker of fossil and sqlite, as a means of drawing the SQL syntax diagrams. It is useful when you need to draw several similar diagrams to illustrate and idea, while being able to version the changes in a repository so that the diagrams can be regenerated later if changes are needed.
… the logical necessity of driverless cars becomes clear. It seems likely there will be real-time auctions to determine the route your Google car takes, so you can be offered empowering choices along the way. […] One marketer put it quite frankly: the goal is to intercept people in their daily routines with brand and promotional messages.
The book draws a distinction between the use of a car to get somewhere, and the pure joy of just going out for a drive. The creators of autonomous vehicles do not seem to be drawn from the population that enjoys going out for a drive down a twisty road.
The book also highlights the problem of incomplete features and systems
… We are not just daunted by by the obscure logic of such machines, but seem to feel ourselves responsible to them, afraid of being wrong in their presence, and therefore reluctant to challenge them even as the […] GPS directs us into a lake.
Many drivers of such vehicles even go as far as to defend these early attempts at making the technology work, thinking that it is acceptable for companies to test out their systems on public roads.
Systems designed to minimize the role of human intelligence tend to be brittle, as they are not able to anticipate every contingency. When they fail, their failures tend to be systemic, in proportion to the comprehensive reach of their control.
To date we have been lucky in that most times the driver has managed to react in time before the vehicle plows into a stationary object.
Certifications can make sense in the mechanical world where there is a One Best Way to achieve a desired outcome, or there is a basic level of competency that is awkward to test for. So many mechanical trades have safety certificates that have to be periodically renewed, and most countries have the idea of a driving license that is a permit to drive a specific type of vehicle. After all we do not want an electrician or gas fitter getting creative with the building code.
In software though, as Perl programmers say There Is More Than One Way To Do It, and we do want developers to get creative, and by some reports parts of Ruby came from Perl. We don’t want to be certifying developers as capable of creating Perl CGI scripts when there is Ruby on Rails available. The same can be said for all of the cloud certifications, a money maker for the providers, but quickly outdated certifications as the could providers release new capabilities every few months, and hey presto, you need to take that certification exam again (and obviously pay the fee again).
The problem is that moving from monoliths to microservices, which makes these tests more important, also makes them harder to build. Which is another good reason to stick with a nice simple monolith if you can. No, I’m not kidding.
Which in turn means you have to be sure to budget time, including design and maintenance time, for your integration testing. (Unit testing is just part of the basic coding budget.)
Of course, the onus is on the driver and two-ton machine to not hit pedestrians in the first place, but a system that would warn the victim of a possible collision through their phone doesn’t seem like a bad idea.
Hint to Honda: Some people do not carry cellphones with them, will it be OK to run them over in your future? If anyone is driving in an area of poor visibility, then drive a lot slower, how hard can that be?
Thinking about the testing pyramid that is the common picture used in many presentations, typically highlighting Unit, Functional and UI testing, and if the diagram comes from an agile background, it will put manual testing at the top and claim that manual testing is the most expensive. The claim is normally that automated unit tests are cheap to write and fast to run, automated functional tests cost more to write and run slower, and the automated UI tests are even more expensive to write and slower to run. Manual UI tests are at the top of the pyramid, the most expensive to write and the slowest to run.
The reality may be different, but that depends on your viewpoint.
Automated testing is not actually testing. It is Automated Checking, algorithmically checking conditions on the UI. An automated check can pass if it is only checking a few items on the UI, when some other parts of the UI have changed for the worse, but are not checked. The checks pass, you have a green, but the UI is broken.
So at this point the question is whether manual UI testing is more expensive than automated UI checking, if even a cursory scan by a tester would see the problem but the automated UI check would not see the problem. Yes we can check the entire contents of the UI in the automated check, but then the code doing the checking will run a lot slower, and be more extensive. The cost of maintenance gets much larger as well, since now ANY change on that page, such as an intended refactoring, will need the automated checking code to be adjusted.
My suggestion for addressing this is to make sure that whenever a manual UI test discovers a mistake, the automated checks need to be updated to detect that problem. This makes debugging the problem and fixing the bug easier, and provides an in built regression test for the future. Ideally add a unit test if that is the smallest test that can reveal the defect, otherwise use a Functional test or a UI test if that is what it takes to reveal the error.
Hurl is a Rust wrapper around libcurl that provides an easier to use interface to Curl. Not a replacement for Playwright, but for testing a JSON REST endpoint or validating the server side of HTML web pages it looks to have a place in the toolkit.
It is controlled either from stdin or files with a nice syntax
GET https://improvingwetware.com/
# confirm response code
HTTP/1.1 200
[Asserts]
xpath "//div[@class='post']" count == 20 # confirm 20 posts per page
[Captures]
nextPage: xpath "string(//span[@class='next']/a/@href)" # get href to the next page
GET https://improvingwetware.com{{nextPage}} # use captured href to navigate to that page
HTTP/1.1 200 # and make sure we found that page
[Asserts]
xpath "//div[@class='post']" count == 20
Only gotcha is that you might need to have an XPath Cheatsheet link handy if you have not needed XPath much for navigation around a web page (I mainly use CSS selectors for Selenium and Playwright)
Note. If you are running under windows command prompt, you cannot use the normal wildcard to specify multiple files, as it assumes that the wildcard will be expanded by the OS. Fix is to run under WSL
>hurl *.hurl --test
error: The filename, directory name, or volume label syntax is incorrect. (os error 123)
Under WSL there are no issues as it expands the wildcards automatically