attention trackingdata collectionFeaturedInterestMisc HacksOriginal Artsense of time

Hack On Self: Collecting Data [Hackaday]

View Article on Hackaday

A month ago, I’ve talked about using computers to hack on our day-to-day existence, specifically, augmenting my sense of time (or rather, lack thereof). Collecting data has been super helpful – and it’s best to automate it as much as possible. Furthermore, an augment can’t be annoying beyond the level you expect, and making it context-sensitive is important – the augment needs to understand whether it’s the right time to activate.

I want to talk about context sensitivity – it’s one of the aspects that brings us closest to the sci-fi future; currently, in some good ways and many bad ways. Your device needs to know what’s happening around it, which means that you need to give it data beyond what the augment itself is able to collect. Let me show you how you can extract fun insights from collecting data, with an example of a data source you can easily tap while on your computer, talk about implications of data collections, and why you should do it despite everything.

Started At The Workplace, Now We’re Here

Around 2018-2019, I was doing a fair bit of gig work – electronics, programming, electronics and programming, sometimes even programming and electronics. Of course, for some, I billed per hour, and I was asked to provide estimates. How many hours does it take for me to perform task X?

I decided to collect data on what I do on my computer – to make sure I can bill people as fairly as possible, and also to try and improve my estimate-making skills. Fortunately, I do a lot of my work on a laptop – surely I could monitor it very easily? Indeed, and unlike Microsoft Recall, neither LLMs nor people were harmed during this quest. What could be a proxy for “what I’m currently doing”? For a start, currently focused window names.

All these alt-tabs, it feels like a miracle I manage to write articles sometimes

Thankfully, my laptop runs Linux, a hacker-friendly OS. I quickly wrote a Python script that polls the currently focused window, writing every change into a logfile, each day a new file. A fair bit of disk activity, but nothing that my SSDs can’t handle. Initially, I just let the script run 24/7, writing its silly little logs every time I Alt-Tabbed or opened a new window, checking them manually when I needed to give a client a retrospective estimate.

I Alt-Tab a lot more than I expected, while somehow staying on the task course and making progress. Also, as soon as I started trying to sort log entries into types of activity, I was quickly reminded that categorizing data is a whole project in itself – it’s no wonder big companies outsource it to the Global South for pennies. In the end, I can’t tell you a lot about data processing here, but only because I ended up not bothering with it much, thinking that I would do it One Day – and I likely will mention it later on.

Collect Data, And Usecases Will Come

Instead, over time, I came up with other uses for this data. As it ran in an always-open commandline window, I could always scroll up and see the timestamps. Of course, this meant I could keep tabs on things like my gaming habits – at least, after the fact. I fall asleep with my laptop by my side, and usually my laptop is one of the first things I check when I wake up. Quickly, I learned to scroll through the data to figure out when I went to sleep, when I woke up, and check how long I slept.

seriously, check out D-Feet – turns out there’s so, so much you can find on DBus!

I also started tacking features on the side. One thing I added was monitoring media file playback, logging it alongside window title changes. Linux systems expose this information over Dbus, and there’s a ton of other useful stuff there too! And Dbus is way easier to work with than I’ve heard, especially when you use a GUI explorer like D-Feet to help you learn the ropes.

The original idea was figuring out how much time I was spending actively watching YouTube videos, as opposed to watching them passively in the background, and trying to notice trends. Another idea was to keep an independent YouTube watch history, since the YouTube-integrated one is notoriously unreliable. I never actually did either of these, but the data is there whenever I feel the need to do so.

Of course, having the main loop modifiable meant that I could add some hardcoded on-window-switch actions, too. For instance, at some point I was participating in a Discord community and I had trouble remembering a particular community rule. No big deal – I programmed the script to show me a notification whenever I switched into that server, reminding me of the rule.

whenever I wish, I have two years’ worth of data to learn from!

There is no shortage of information you can extract even from this simple data source. How much time do I spend talking to friends, and at which points in the day; how does that relate to my level of well-being? When I spend all-nighters on a project, how does the work graph look? Am I crashing by getting distracted into something unrelated, not asleep, but too sleepy to get up and get myself to bed? Can I estimate my focus levels at any point simply by measuring my Alt-Tab-bing frequency, then perhaps, measure my typing speed alongside and plot them together on a graph?

Window title switches turned out to be a decent proxy for “what I’m currently doing with my computer”. Plus, it gives me a wonderful hook, of the “if I do X, I need to remember to do Y” variety – there can never be enough of those! Moreover, it provides me with sizeable amounts of data about myself, data that I now store. Some of you will be iffy about collecting such data – there are some good reasons for it.

Taking Back Power

We emit information just like we emit heat. As long as we are alive, there’s always something being digitized; even your shed in the woods is being observed by a spy satellite. The Internet revolution has made information emissivity increase exponentially, a widespread phenomenon it now uses to grow itself, since now your data pays for online articles, songs, and YouTube videos. Now there are entire databanks containing various small parts of your personality, way more than you could ever have been theoretically comfortable with, enough to track your moves before you’re aware you’re making them.

:¬)

Cloning is not yet here, but Internet already contains your clone – it can sure answer your security questions to your bank, with a fair bit of your voice to impersonate you while doing so, and not to mention all the little tidbits used to sway your purchase power and voting preferences alike. When it comes to protections, all we have is pretenses like “privacy policies” and “data anonymization”. EU is trying to move in the right direction through directives like GDPR, with Snowden discoveries having left a deep mark, but it’s barely enough and not a consistent trend.

Just like with heat signatures, not taking care of your information signature gives you zero advantages and a formidable threat profile, but if you are tapped into it, you can protect people – or preserve dictatorships. Now, if anyone deserves to have power over yourself, it’s you, as opposed to an algorithm currently tracking your toilet paper purchases, which might be used tomorrow to catch weed smokers when it notices an increase in late night snack runs. It’s already likely to be used to ramp up prices during an emergency, or just because of increased demand – that’s where all these e-ink pricetags come into play!

Isn’t It Ridiculous?

Your data will be collected by others no matter your preference, and it will not be shared with you, so you have to collect it yourself. Once you have it, you can use your data to understand yourself better, become stronger by compensating for your weaknesses, help you build healthier relationships with others, living a more fulfilling and fun life overall. Collecting data also means knowing what others might collect and the power it provides, and tyis can help you fight and offset the damage you are bound to suffer because of datamining. Why are we not doing more of this, again?

We’ve got a lot to catch up to. Our conversations can get recorded with the ever-present networked microphones and then datamined, but you don’t get a transcript of that one phonecall where you made a doctor’s appointment and forgot to note the appointment time. Your store knows how often you buy toilet paper, what’s with these loyalty cards we use to get discounts while linking our purchases to our identities, but they are not kind enough to send you a notification saying it might be time to restock. Ever looked back on a roadtrip you did and wished you had a GPS track saved? Your telco operators know your location well enough, now even better with 5G towers, but you won’t get a log. Oh, also, your data can benefit us all, in a non-creepy way.

Unlike police departments, scientists are bound by ethics codes and can’t just buy data without the data owner’s consent – but science and scientific research is where our data could seriously shine. In fact, scientific research thrives when we can provide it with data we collected – just look at Apple Health. In particular, social sciences could really use a boost in available data, as reproducibility crises have no end in sight – research does turn out to skew a certain way when your survey respondents are other social science students.

Grab the power that you’re owed, collect your own data, store it safely, and see where it gets you – you will find good uses for it, whether it’s self-improvement, scientific research, or just building a motorized rolling chair that brings you to your bed as it notices you become too tired after hacking all night throughout. Speaking of which, my clock tells me it’s 5 AM.

Works, Helps, Grows

The code is on GitHub, for whatever purposes. This kind of program is a useful data source, and you could add it into other things you might want to build. This year, I slapped some websocket server code over the window monitoring code – now, other programs on my computer can connect to the websocket server, listen to messages, making decisions based on my currently open windows and currently playing media. If you want to start tracking your computer activity right now, there are some promising programs you should consider – ActivityWatch looks really nice in particular.

I have plans for computer activity tracking beyond today – from tracking typing on the keyboard, to condensing this data into ongoing activity summaries. When storing data you collect, make sure you include a version number from the start and increment it on every data format change. You will improve upon your data formats and you will want to parse them all, and you’ll be thankful for having a version number to refer to.

The GitHub-published portion is currently being used for a bigger project, where the window monitoring code plays a crucial part. Specifically, I wanted to write a companion program that would help me stay on track when working on specific projects on my laptop. In a week’s time, I will show you that program, talk about how I’ve come to create it and how it hooks into my brain, how much it helps me in the end, share the code, and give you yet another heap of cool things I’ve learned.

What other kinds of data could one collect?



Leave a Reply