Peeking behind the curtain

Notes on the construction of a 'big' paper

Nov 27, 2022

Some colleagues and I just had a paper published. It’s a global study of forest fires, atmospheric ‘thirst’ and climate change. It’s rather grim - we find that in many different forest types all around the world, climate change is expected to lead to more days with drying conditions associated with fire activity. This in turn could have serious consequences for carbon storage, human health and all the other things affected by fire. The silver lining, if you want to put it that way, is that our paper adds to an already solid evidence base for the usefulness of vapour pressure deficit in predicting forest fire activity. We can always do with better fire predictions.

Old newspaper ad for draperies and lace, from Wikimedia Commons — Rhodes Brothers Department Store ad in Tacoma Times (Source: Wikimedia Commons)

I’m not going to write about that here though. Instead, I’ll take you behind the scenes for a potted tour of the making of the paper. If you’re interested in a summary of the paper for lay readers, keep your eye on Pursuit (and possibly the Conversation) in the next week or so.

Inception

It was probably on a Friday afternoon, when the panic and the pressure of the week subsides enough to allow for slightly more relaxed conversations, that I first chatted with my supervisor about the idea for the study.

Them: We know that VPD (vapour pressure deficit) is connected to fire in southeast Australia and Portugal.

Me: True.

Them: What if we look at it globally?

Me: Like, satellite records, reanalysis data?

Them: Yeah. And climate change too.

Me: Hmm. It would be a lot of work.

Them: But a good study.

Me: Hmm. I guess conceptually it’s pretty straightforward. Get the VPD at the time and location of the fire, compare the two. Use GCM (global climate model) data to see how it might change.

Them: Yeah.

Me: It would be a lot of work.

Them: Do you think you could get it done in time for my EGU (European Geophysical Union) presentation?

Me: Hmm.

This is almost certainly a gross mischaracterisation, but there’s a few things here. Scientists are constantly dreaming up studies and experiments. Only a small number of these make it through the door from potential to reality. One of the many factors this decision hinges on is the size of the study. Not just the effort required, but the expected return on investment.

If the idea is big enough, there may be a faint hope, or fantasy, or delusion, that the paper might be published in a High Impact Journal. I capitalise those words advisedly. Publishing your work in the most prestigious and highly ranked journals can have positive side effects on the reach of your work, your chance of a promotion, the success of your next grant application, the size of your ego and so on. These journals are very hard to get into though. What if you do a big study but the fancy journals don’t want a bar of it? Was it still worth it?

I don’t feel great that these are among the considerations in whipping up study ideas, but there we are. Most of the time, for me at least, we do not entertain the idea that our next paper will end up somewhere high falutin’. Of course, riches and enduring stardom aren’t the only incentives to do a study. Sometimes a simple deadline does wonders - like an upcoming conference presentation.

I liked the idea of having a crack, for the first time, at one of these journals. But I was under no illusion of the prospect of success. I have seen brilliant papers miss out, and pretty lame papers get in, and heard colleagues jokingly comparing their fancy journal Rejection Index.

For me there was an additional motivator, which was to pick up some new skills in data wrangling. I was familiar with big climate and weather datasets from my PhD studies, but I was equally aware of how far behind a lot of my colleagues I was in terms of some of the skills and best practice involved in acquiring, processing, storing and visualising data. This project felt like a great opportunity to force myself to develop some of those skills a little further. Whatever happened to the paper, those skills would hopefully be useful down the track.

The Matrix

So what exactly were these datasets I was wrangling, this world of cascading green characters I was about to dive into, Keanu Reeves style? There were basically three: remotely sensed fire detections, hybrid model-observation weather data and climate change projections.

The fire data was available from the pleasingly named fuoco server at the University of Maryland. I ended up doing a bunch of work on one version of the data, then scrapping it and starting all over again with a different version. The essence of this data is a grid, around 500m in resolution, of pixels in either an on or off state for each day of the year, all over the earth. On means fire, off means no fire. We used about 69 GB of this data.

As an aside, it was possible to get the data pre-divided into 20 odd continental windows. These are for areas like Australia, southern South America, northern South America, and so on. Quite a convenient way of packaging things up. It’s never enough to just have a lot of data - you always need to think about the most sensible way of breaking it up (if there’s enough to break up). We used not only these pre-defined continental windows, but also a terrestrial biome map, which lists major forest types, as well as a forest cover map, which some people lovingly crafted to try to precisely track only the land that is actually forest.

The hybrid weather data (a reanalysis), like the fire data, started with one product and then switched to another. Unlike the fire data, which was just two versions of the same thing, we ended up switching from one reanalysis dataset to another completely different one. This was based partly on some intel I got from an expert in humidity on the other side of the world (humidity is used along with temperature to calculate VPD). We used about 45 GB of this data.

The climate change data came from the modelling groups that contribute to the IPCC reports. Apart from developing and running models, they take part in a global intercomparison project, where they agree on a set of settings to run their models with, so that when they’re done they can compare them with each other. We tried to pick three models that performed reasonably well on a number of different measures, and we also picked a low and high emissions scenario. There’s a measly 15 GB of global climate data sitting in the project folder.

Hackers

I have a love hate relationship with code. I appreciate it for its ability to move mountains and do magic, but I resent it for its pretension. I mean, c’mon, a computer language is nothing like English or German or Spanish or any other language that people speak! I love language, but I do not love computer language.

I had to write a lot of code for this project. Most of it was in R, with a sizeable minority in bash and a dash in Python. I needed bash to make my R code work on the processing machines and supercomputers which did the heavy lifting in this project. Python was needed to talk to the provider of the weather data.

For this project, most of the work was in getting to the point where I could make a big table with very long columns of fire (Y/N), VPD value (kPa) and the ID for each forest type and continental window. Actually, I didn’t make one big table, I looped between those windows and forest types and did them one at a time.

I’m somewhat optimistic that my code did what it says on the tin. Still, all in all I would say my coding skills and practices are pretty crap. I seem to forever be caught between ‘quite advanced compared to people who don’t know anything about it’ and ‘a long, long way behind people who know what they’re doing’. I was lucky to have some very clever people help with code and data storage along the way.

The Neverending Story

What if I told you that we managed to get the whole analysis done inside a few months, in time for my supervisor’s conference presentation?

Well it’s true, all of it! Even the part where we were exchanging updates by email from different hemispheres in the hours and minutes before the talk. But I was not comfortable writing up the paper based on that sprint. So we went back and did things in a more orderly fashion. That took three years.

I didn’t spend all of that time working on the paper. I was occupied with other things most of the time. Every now and again I would dip back into it, but it was such a behemoth that it would take forever to remember where I was up to and so I wouldn’t make much progress. For long periods the project was like a shadow runnin’ through my days, like a beggar going from door to door. The longer it would take, the sicker I would feel. It was only really during 2021 that I was able to focus a bit more on getting through everything.

We finally submitted the dang thing to a journal in late 2021. It is a great feeling submitting a study to a journal. For most papers I have led (meaning my name is the first one in the author list), there have been many moments I doubted I would ever submit it, when I despaired at the time it was taking, and the form it had taken. I have this feeling less nowadays, but am still regularly disturbed by how long things take.

The dull ache of a paper not yet finished is replaced by the intermittent angst of a paper not yet spiked. You can relax knowing there is nothing you can do while it is being reviewed, but you also know that at any moment you will receive an email in which your paper might be rejected, ridiculed or otherwise ravaged by cunning reviewers.

A strange transformation sometimes happens during the review process. You start off with a clear and firm idea of what the paper is about, but by the end you have made so many compromises and concessions to the reviewers’ demands that the final paper can look quite different to what was submitted. Big fancy journals are known for asking for lots of extra work.

Our paper was quickly rejected by Nature Climate Change, which was the maximum fanciness I thought might be achieveable with this particular study. We then tried Nature Communications and it somehow got sent out for review (rather than being rejected immediately). That was a big win! Several rounds of review ensued, with a decent amount of extra work, and just under 11 months after being submitted, it was accepted. One of the good things about this and some other journals is that they air all the dirty laundry of peer review. Anyone can come in and take a look at the back and forth between reviewers and study authors.

The paper was hugely expensive to publish. The ‘article processing charges’ are around €4890/$5890/£4290! Plus GST! This is your taxpayer funding at work. Publishing scientific articles is big, big business.

There was also a strange mismatch between the prestige and cost of a journal like this, and the quality of some of the images in the article. To be frank, even I could have come up with a better jpg to pdf conversion than that.

Broadcast News

Oh, if it all only ended there. But no, once a paper is done you must tell people about it! Don’t get me wrong, I really do believe in communicating science outside academia, I’m just still tired.

I wrote a blue bird thread (probably one of my last on that site), I mentioned it on Mastodon and I shared it on Linked In. Thanks largely to the first of these, the paper has an “Altmetric” score of 115, meaning it is a ‘high attention’ paper. What does this mean?

So far, Altmetric has seen 213 tweets from 194 users, with an upper bound of 862,969 followers.

I’m not sure what that means. I’ve been chatting with the media team at work, and they have kindly decided to write a media release (to be delayed until after the state election!). I’ve also written a short article for their popularisation platform, Pursuit, which will appear soon, maybe also in the Conversation.

By the end of this, perhaps I’ll have milked the moment maximally.

Back to the Future

It’s been quite an ordeal to be honest, one I’m pleased is more or less over. Looking back from this privileged, published position, I’m not sure what I’ve learned. I already knew that I’ve been incredibly fortunate to work with some brilliant, supportive people. They contributed in many ways, not least of which was giving me the confidence to keep going. My data wrangling has gone up a notch or two, but I am convinced of the need to team up with some real data wrangling gurus in future. For better or worse, I might be a bit less intimidated by all these fancy journals?

What I can say for sure is that I have a newfound appreciation - an obsession, even - for tractable projects! Big data may be grand, but small is, as E.F. Schumacher said, beautiful.

Mick

So much admiration for your persistence, self-awareness, deliberate collabs and massive brain to be able to pull all this together.

Also glad you ended with the sentiment you did. Recently sitting in a 2 hour meeting with 6 academics at the end of a funded research project I butted in and said something along the lines of "From my privileged perspective as a practitioner whose career is not judged on number or profile of papers published, it doesn't matter how many papers you publish or where you publish them. As long as the science hits the emails or ears of the people who will use it and improves practice then its purpose will be served".

There was shocked silence, confused looks and some shuffling of feet, a brief acknowledgement of the need for a 'Practitioners Guide' and, with a relieved sigh, a return to allocating RAs and PhD students to lead the close out of minor papers to acquit the grant. Fair to say I may have misjudged my setting and overplayed the 'KISS principle' approach.

Your article has given me a better understanding of the complexity my academic partners deal with. Thanks!

Expand full comment

1 reply by Hamish Clarke

Kat

Dec 23, 2022

What a great commentary! I love the narrative of the journey, with a small dabbling of technical droplets and an emphasis on internal processing. It totally conveys the experience of conceiving projects, negotiating roles and timelines, doing analysis, finding the space to actually write articles and then revising them endlessly. I especially enjoyed that feeling of diving back in to a complex process, which takes a day to remember what you're doing and then, all too often, not having any more time to take the next step... very cool.

2 more comments...

Future Fire

Discussion about this post