HN.zip

A steam locomotive from 1993 broke my yarn test

141 points by jgrahamc - 60 comments
bouke [3 hidden]5 mins ago
So the real problem is that Jest just executes to whatever `sl` resolves. The fix they intent to release doesn't address that, but it tries to recognise the train steaming through. How is this acceptable behaviour from a test runner, as it looks like a disaster to happen. What if I have `alias sl=rm -rf /`, as one typically wants to have such a command close at hand?
tlb [3 hidden]5 mins ago
Exec doesn't know about shell aliases. Only what's in the $PATH.

I liked the shell in MPW (Mac Programmer's Workshop, pre-NeXT) where common commands had both long names and short ones. You'd type the short ones at the prompt, but use the long, unambiguous ones in scripts.

skykooler [3 hidden]5 mins ago
Theoretically you could do this in Linux by calling /usr/bin/sl or whatever - but since various distros put binaries in different places, that would probably cause more problems than it could solve.
Kwpolska [3 hidden]5 mins ago
PowerShell has long commands and short aliases, but the aliases can still shadow executables, e.g. the `sc` alias for `Set-Content` shadows `sc.exe` for configuring services. And you only notice when you see no output and weird text files in the current working directory.
szszrk [3 hidden]5 mins ago
Networking crowd probably think it's obvious. Because of things like Cisco cli, or even Mikrotik. Or "ip" cli as well, I guess.

I never bothered to check what's the origin of that pattern.

hnlmorg [3 hidden]5 mins ago
Ive taken entire web farms offline due to an unexpected expansion of a command on a Cisco load balancer.

The command in question was:

    administer-all-port-shutdown 
(Or something to that effect —it’s been many years now)

And so I went to log in via serial port (like I said, *many years ago so this device didn’t have SSH), didn’t get the prompt I was expecting. So typed the user name again:

    admin
And shortly afterwards all of our alarms started going off.

The worst part of the story is that this happened twice before I realised what I’d done!

I still maintain that the full command is a stupid name if it means a phrase as common as “admin” can turn your load balancer off. But I also learned a few valuable lessons about being more careful when running commands on Cisco gear.

Tractor8626 [3 hidden]5 mins ago
No. This is not the real problem. There is nothing you can do if your 'bash', 'ls', 'cat', 'grep', etc do something they not supposed to do.

Proper error handling would be helpful though.

Etheryte [3 hidden]5 mins ago
The fact that Jest blindly calls whatever binary is installed as `sl` is downright reckless and that's an understatement. If they need the check, a simple way to avoid the problem would be to install it as a dependency, call `require.resolve()` [0] and Bob's your uncle. If they don't want the bundle size, write a heuristic, surely Meta can afford it. Blindly stuffing strings into exec and hoping it works out is not fine.

[0] https://nodejs.org/api/modules.html#requireresolverequest-op...

Joker_vD [3 hidden]5 mins ago
"That's just, like, your opinion, man". There is another school of thought that postulates that an app should use whatever tools that exist in the ambient environment that the user has provided the app with, instead of pulling and using random 4th-party dependencies from who knows where. If I symlinked e.g. "find", or "python3", or "sh", or "sl" to my weird interceptor/preprocessor/trapper script, that most likely means that I do want the apps to use it, damn it, not their own homebrewed versions.

> a simple way to avoid the problem would be to install it as a dependency

I've seen once a Makefile that had "apt remove -y [libraries and tools that somehow confuse this Makefile] ; apt install -y [some other random crap]" as a pre-install step, I kid you not. Thankfully, I didn't run it with "sudo make" (as the README suggested) but holy shit, the presumptuousness of some people.

The better way would have been to have "Sapling CLI" explicitly declared as a dependency, and checked for, somehow. But as the whole history of dev experience shows, that's too much ask from the people, and the dev containers are, sadly, the sanest and most robust way to go.

Etheryte [3 hidden]5 mins ago
I think where our opinions differ is what boundaries this logic should cross. When I'm in Bash-land, I'm happy that my Bash-isms use the rest of what's available in the Bash env. When I'm in Node, likewise, as this is an expected and desirable outcome. Where this doesn't sit right with me is when a Node-land script crosses this boundary and starts murking around with things from a different domain.

In general, I would want everything to work by the principle of least surprise, so Node stuff interacts with Node dependencies, Python does Python things, Bash does Bash env, etc. If I need one to interact with the other, I want to be explicit about it, not have some spooky action at a distance.

blueflow [3 hidden]5 mins ago
What else should the test runner do?
pavel_lishin [3 hidden]5 mins ago
There must be a better way to tell if a repo is a Sapling repo than by running some arbitrary binary, right?
Symbiote [3 hidden]5 mins ago
For Git one could look for .git/config. There must be something equivalent.
pasc1878 [3 hidden]5 mins ago
Use the full path of sl and not rely on $PATH in the same way cron and macOS GUI apps do for I assume this exact reason.
stonegray [3 hidden]5 mins ago
Is the full path guaranteed? For example homebrew, snap, and apt might put it all in different places. $PATH is a useful tool.
pasc1878 [3 hidden]5 mins ago
But not in this case where you have two executables with the same name.

You have to know where the tool was installed or else be certain no other sl is on your path.

skipants [3 hidden]5 mins ago
What if the full path is just `/usr/bin/sl`?
pasc1878 [3 hidden]5 mins ago
Then yopu get the sl there which could be correct.
charcircuit [3 hidden]5 mins ago
Finding the full path of sl requires looking at $PATH
pasc1878 [3 hidden]5 mins ago
In this case not as then you find the wrong sl - you need to know where the correct sl was installed.
salmonellaeater [3 hidden]5 mins ago
A useful error message would have made this a 1-minute investigation. The "fix" of trying to detect this specific program is much too narrow. The right fix is to change Yarn to print a message about what it was trying to do (check for a Sapling repo) and what happened instead. This is also likely a systemic problem, so a good engineer would go through the whole program and fix other places that need it.
fifticon [3 hidden]5 mins ago
as a 30+y employed systems programmer, when I read a story like this, I get angry at the highly piled brittle system,not at the guy having sl installed. I am aware there exists a third option of not getting angry in the first place, but I hate opaque nonrobust crap. This smells like everything I hate about front-end tooling. ignorance and arrogance in perfect balance.
ericmcer [3 hidden]5 mins ago
What would you have done differently? They were dependent on SL (which is a facebook source control system written in C) but the user had overwritten the expected path with a shell script. That is not something most engineers would build around... "what if the user is overwriting the path to dependencies with nonsense shell scripts?".

It doesn't feel like something that is entirely the Jest maintainers fault, I am not sure why Jest needs a source control system but there are probably decent reasons.

Like if I overwrite `ls` to a shell script that deletes everything on my desktop and then I execute code you wrote that relies on `ls` are you to blame because you didn't validate its behavior before calling it?

MD87 [3 hidden]5 mins ago
The difference is that `ls` is specified in POSIX and everyone has roughly the same expectations of what it does.

Nothing specifies what a binary called `sl` does. The user didn't "overwrite" anything. They just had an `sl` binary that was not the `sl` binary Jest expects. Arguably they had the more commonly known binary with that name.

mmlb [3 hidden]5 mins ago
Use the lessons learned from those before us in less heterogeneous days, aka inspect the binaries you're going to call out to for fitness. Things like "check if grep is gnu or bsd" or "check if sl is sapling or steamlocomotive".

I've done that a bit to deal with macos crippled bash for example.

Tractor8626 [3 hidden]5 mins ago
Totally happens in C code too. Maybe even more often.

Just today had proxmox not working because of invalid localhost line in /etc/hosts. Or had problem with logging in KDE because /etc/shadow was owned by root.

In both cases only incomprehensible error messages. Luckily solutions was googleable.

GTP [3 hidden]5 mins ago
Just from the title, I suspected that Steam Locomotive had something to do with it. So I quickly glanced through the article up to the point where the locomotive shows up. Sometimes there's the idea hanging in my mind to make a version called Slow Locomotive, where the train slows down every time you press ctrl-c.
dullcrisp [3 hidden]5 mins ago
If you press ^Z does it stop entirely?

And do these sorts of ideas ever get you into trouble?

throwanem [3 hidden]5 mins ago
I once reimplemented in Perl Nethack's logic for phase-of-moon and Friday 13th computation and notification, and added the resulting cute little script to the root .profile on our consulting firm's main web hosting boxes.

I didn't get fired when my boss found it by surprise a couple months (and lunar cycles) later, but I did learn a valuable lesson about how one may wisely limit one's exercise of whimsy.

Google took a few years more to achieve the same discovery, as I recall, but presumably this has to do with pedagogical methods involving not as many ex-sergeants.

burnte [3 hidden]5 mins ago
I discovered SL in 1999, and forgot about it. I rediscovered it 5 years later when on my personal server I typoed ls as sl and hit enter. A steam locomotive drove across my screen, and I remembered installing it 5 years later and laughed by butt off. I wound up pranking myself and it took 5 years to pay off!
pjc50 [3 hidden]5 mins ago
Plus points for using strace. It's one of those debugging tools everyone know about for emergencies that can't be solved at a higher level, and a great convenience of using Linux. The Windows ETW system is much harder to use, and I'm not sure if it's even possible at all under OSX security.
throwway120385 [3 hidden]5 mins ago
I have solved an incredible number of problems just by looking at strace output very carefully. Strace combined with Wireshark or Tcpdump are incredible as a toolset for capturing what a program is doing and for capturing what the effect is either on the USB or the NIC.
frizlab [3 hidden]5 mins ago
macOS has dtrace which is actually nicer to use. Cannot be used on all processes when SIP is on though.
pjc50 [3 hidden]5 mins ago
Last time I tried SIP prevented me from using it on my own processes, but I may have been holding it wrong.
dontlaugh [3 hidden]5 mins ago
macOS’s Solaris-inspired dtrace is actually nicer, especially the UI.
pjc50 [3 hidden]5 mins ago
Is there a guide for how to use this, including the UI, with SIP on?
jntun [3 hidden]5 mins ago
Instruments is implemented under-the-hood with dtrace, that could be what they are referring to.
dontlaugh [3 hidden]5 mins ago
Yes. Most things run well with Instruments attached. I’ve only used the dtrace cli a few times.
snovymgodym [3 hidden]5 mins ago
The real story here is that the author and his coworker wasted a bunch of time tracking down this bug because their dev environment was badly set up.

> his system (MacOS) is not affected at all versus mine (Linux)

> nvm use v20 didn't fix it

If you are writing something like NodeJS, 99% of the time it will only ever be deployed server-side on Linux, most likely in a container.

As such, your dev environment should include a dev dockerfile and all of your work should be done from that container. This also has the added benefit of marginally sandboxing the thousands of mystery-meat NPM packages that you will no doubt be downloading from the rest of your machine.

There is zero reason to even mess with a "works on my machine" or a "try a different node version" situation on this kind of NodeJS project. Figure out your dependencies, codify them in your container definition, and move on. Oh, your tests work on MacOS? Great, it could not matter less because you're not deploying there.

Honestly, kind of shocking that a company like Cloudflare wouldn't have more standard development practices in place.

bilekas [3 hidden]5 mins ago
>If you are writing something like NodeJS, 99% of the time it will only ever be deployed server-side on Linux, most likely in a container.

I'm really curious where you're getting this impression from ? I for one never run docker containers on my dualcore atom server with 4gb ram.. but i have a lot of node services running.

> There is zero reason to even mess with a "works on my machine" or a "try a different node version" situation on this kind of NodeJS project

There are a lot of reasons to investigate these things, infact that's what I would expect from a larger more industry invoved companies, knowing the finer nuances and details of these things can be important. What might seem benign can just as quickly become something really dangerous or important when working on a huge scale such as CloudFlare.

Edit : BTW I do agree mistakes were made, and the hell that is NPM chain of delivery attacks is terrifying. Those are the points I would focus on more personally.

snovymgodym [3 hidden]5 mins ago
> I'm really curious where you're getting this impression from?

Experience mainly, though perhaps I live in a bubble. My "99%" assertion was more pointed at the "server-side on Linux" part than the "most likely in a container" part.

Really the point I wanted to make was that your development and test environment should be the same as, or as close as possible to, your production environment.

If your app is going to be deployed on Red Hat Enterprise Linux (whether in a container, VM, or baremetal), then don't bother chasing down cryptic NPM errors that arise when you run it on Ubuntu, Mac, or Windows. Just run everything out of a RHEL docker container which mimics your production environment and spent your limited time doing the actual task at hand. It simply is not worth your time to rabbit hole endlessly on NPM errors that happen on an environment you'll never deploy to.

> There are a lot of reasons to investigate these things, ...

Sure, I don't really disagree with that and generally it's good to have a solid understanding of your tools and what lies in the layers below the abstractions that you normally work with. The detective work in the post is solid.

But the thing is that the author was supposed to be learning NodeJS in order to ramp up on a React project. But he got derailed (heh) by this side quest which delayed him being able to do the actual work he set out to do. Whether or not it was worth the time is subjective. But either way, it would not have happened in the first place with better dev environment practices.

bilekas [3 hidden]5 mins ago
> Really the point I wanted to make was that your development and test environment should be the same as, or as close as possible to, your production environment.

I’m really glad to hear that actually, I think you did make that point but it was a bit overlooked with the other points.

About having better Dev environments I think you're also spot on, not just with infrastructure but also with support from other maybe more experienced developers who could identify these things early and knowledge share, for me at least that's one of the main development requirements, if you're not learning, you should be teaching.

throwanem [3 hidden]5 mins ago
The last time I dealt with a non-dockerized Node deployment, at work or at home, was in 2013. That this was also the year of Docker's initial release is no coincidence at all.
bilekas [3 hidden]5 mins ago
I think for production it’s a good move, it just doesn’t feel like a sure assumption that the majority of node services are containerized.
throwanem [3 hidden]5 mins ago
Well, the argument is more that the vast majority of Node services should be containerized, because the potentially large benefit of so doing outweighs the relatively small cost. I can't speak to anyone's assumptions, but I can say I'm inclined to support this argument because my professional experience for many years has been that containerization causes far fewer problems than it solves.
wrs [3 hidden]5 mins ago
I had a similar problem where builds were timing out. When I looked at the build log, there was a calendar in it (?!). I eventually figured out a script was calling `date`, and something I had `go install`ed (I think) had a test binary called `date` that was an interactive calendar.
Kwpolska [3 hidden]5 mins ago
Naming your source control tool after a common mistyping of ls is such a Facebook move.
m4rtink [3 hidden]5 mins ago
Yeah! What are they going to do next - call a programming language "go" or something ? Even Google would not be that stupid - imaging Googling for that and getting only irrelevant stuff!
computerfriend [3 hidden]5 mins ago
Naming it after a commonly installed program that has been around since 1993 is also some hubris.
sureglymop [3 hidden]5 mins ago
Relatable debugging, though after 2 tries I would have moved straight to strace/truss.

Edit: okay I continued reading and that was actually the next step. :)

rossdavidh [3 hidden]5 mins ago
I demonstrated that I am not a serious or good programmer by installing steam locomotive on my Linux laptop immediately after reading this.
rrauenza [3 hidden]5 mins ago
I'm trying to recall -- wasn't there someone who had a similar issue with a game? Maybe a (pun not intended) Steam game? They'd try to run their game and something else would launch? Or vice versa?
normie3000 [3 hidden]5 mins ago
> git commit, which hooked into yarn test

There's the real wtf. How are you meant to commit a failing test? Or any other kind of work in progress?

zdragnar [3 hidden]5 mins ago
You mark the failing test with "failing". The test runner knows that it might fail but doesn't fail the suite.

I'm not a big fan of git commit hooks, but it can give faster feedback than waiting for a CI runner to point out something that should have been obvious if you keep it light weight (such as style linting or compiler warnings).

Edit: replaced "Todo" with "failing" since we're talking about jest specifically: https://jestjs.io/docs/api#testfailingname-fn-timeout

computerfriend [3 hidden]5 mins ago

    git commit -n
jokoon [3 hidden]5 mins ago
I thought a real steam locomotive was passing next to a data center and crashed the server because of the vibrations of the train.
zitterbewegung [3 hidden]5 mins ago
If you were troubleshooting this and I know what I’m saying is with 20/20 hindsight why wouldn’t you try to test this on someone else’s machine to see if it is an environment issue ? They seemed to get use extensive analysis at that point. Also I’ve seen Jenkins deployments that have test runners that would run JS unit tests.
mzs [3 hidden]5 mins ago
polygot [3 hidden]5 mins ago
Would dev containers solve this issue?
WalterBright [3 hidden]5 mins ago
Not about steam locomotives. Disappointed.