The guide to extracting statistics from your Signal conversations

· 980 words · 5 minutes read Computer things

To be able to visit each other during these strange Covid-19 times, my girlfriend and I have to “prove” our relationship to my country of residence. One of the pieces of evidence is the messages that we’ve been sending each other over the years that we’ve been together. Signal is our main medium of communication for messaging, calls, and video, and we send each other tens of messages daily. So to add weight to our application, we wanted to add statistics around the number of messages that we’ve sent each other throughout our relationship.

This guide is the result of my explorations into how to extract statistics from our Signal conversations. Beware, it is quite technical.

About Signal

Signal differentiates itself from other messaging apps like WhatsApp and Messenger by its focus on privacy. It was created by Moxie Marlinspike, who is a highly regarded computer security expert and cryptographer. Its list of testimonials are among the most powerful I’ve ever seen: Edward Snowden, Laura Poitras, Bruce Schneier, all recommend Signal. It’s open-source and the company behind it is a non-profit.

Because of its focus on privacy, Signal does not store any messages on its servers. So the only way to access your messages for analysis is through your local copy in the Android, iOS, or desktop apps that Signal provides. Each requires a different method to analyze the messages and extract statistics.

From the Android app

1. Create a backup

The Signal Android app can create an encrypted backup of the messages stored on your device. I refer to the instructions on the Signal website on how to do so.

2. Transfer the backup to your computer

To analyze the messages, we need to send the backup from your phone to your computer. I simply uploaded the encrypted backup to Google Drive which I then downloaded to my computer. But you can use any other means of transferring files between your phone and computer.

3. Decrypt the backup

We can’t access any messages in the backup until we decrypt it. Fortunately, someone built a tool for that called signal-back. Simply download a binary and run:

$ signal-back format -f CSV -o backup-android.csv <signal-XXX.backup>

It will ask for the 36-digit passcode that Signal gave you when creating the backup in step 1. After running this command, backup-android.csv will contain a history of your messages in CSV format.

4. Extract statistics from the CSV

Now that we have our messages in CSV, we can analyze it. I wrote a small Ruby script that parses the file and extracts a count of messages in a particular conversation, grouped by year.

# analyze-signal-android.rb

require 'csv'

# Link to the backup CSV.
filepath = ARGV[0]
# Thread ID the conversation you want to get stats from.
thread_id = ARGV[1]

messages = []
CSV.foreach(filepath, headers: true) do |row|
  if row['THREAD_ID'] == thread_id
    messages << {
      date_sent: Time.at(row['DATE_SENT'].to_i / 1000)
    }
  end
end

messages
  .group_by { |message| message[:date_sent].year }
  .each { |year, msgs| puts "#{year}: #{msgs.count} messages" }

puts "Total number of messages: #{messages.count}"
puts "Earliest date: #{messages[0][:date_sent].strftime('%B %e, %Y')}"

To run it:

$ ruby analyze-signal-android.rb backup-android.csv 1
2018: 12676 messages
2019: 22225 messages
2020: 13381 messages
Total number of messages: 48282
Earliest date: May 2, 2018

The first argument to the script is the path to the backup CSV. The second is the Signal thread ID for which you want to get the statistics. For instance, the conversation with my girlfriend is assigned ID 1. I suggest you look into the CSV to find out the ID of the thread that you’re interested in.

From the desktop app

The Signal desktop application is built on Electron and it stores the messages in an SQLite database.

1. Get the SQLite database

Depending on whether you’re on Linux, Mac, or Windows, the SQLite database is located in a different folder:

  • Linux: ~/.config/Signal/sql/db.sqlite
  • Mac: ~/Library/Application Support/Signal/sql/db.sqlite
  • Windows: C:\Users\<YourName>\AppData\Roaming\Signal\sql\db.sqlite (not tested)

2. Decrypt and convert the SQLite database to CSV

The Signal SQLite database is encrypted using SQLCipher. So we need to decrypt it and convert it to CSV before we can analyze it.

# sqlite-to-csv.sh

# From https://unix.stackexchange.com/a/505009/413853
sigBase="${HOME}/.config/Signal/";
key=$( /usr/bin/jq -r '."key"' ${sigBase}config.json );
db="${HOME}/.config/Signal/sql/db.sqlite";
clearTextMsgs="${sigBase}backup-desktop.csv";

/usr/bin/sqlcipher -list -noheader "$db" "PRAGMA key = \"x'"$key"'\";select json from messages;" > "$clearTextMsgs";

To run this script:

$ bash sqlite-to-csv.sh

You’ll need to install a sufficient recent version of sqlcipher. I first ran it with sqlcipher 3.15.2 but I got an error Error: file is encrypted or is not a database when running the script. It worked when using sqlcipher 3.31.0 that I built from source.

The file backup-desktop.csv now contains a CSV of the messages stored in the Signal desktop application.

3. Extract statistics from the CSV

The format of the CSV from the desktop application is different than the one generated from the Android app. Another Ruby script does the job to extract the statistics:

require 'json'

# Link to the backup CSV.
filepath = ARGV[0]
# Number of the person you want to analyze your conversations of.
number = ARGV[1]

messages = []
File.foreach(filepath) do |line|
  begin
    json = JSON.parse(line)

    if json["delivered_to"] == [number] || json["source"] == number
      messages << {
        date_sent: Time.at(json['sent_at'].to_i / 1000)
      }
    end
  rescue
  end
end

messages
  .group_by { |message| message[:date_sent].year }
  .each { |year, msgs| puts "#{year}: #{msgs.count} messages" }

puts "Total number of messages: #{messages.count}"
puts "Earliest date: #{messages[0][:date_sent].strftime('%B %e, %Y')}"

To run it:

$ ruby analyze-signal-desktop.rb signal-desktop.csv <phone-number>
2018: 8318 messages
2019: 22258 messages
2020: 13381 messages
Total number of messages: 43957
Earliest date: July 28, 2018

The first argument is the decrypted SQLite database in CSV and the second argument is the phone number of the person with whom you want to analyze your conversations with.

From iOS

I haven’t looked into this because I don’t use an iPhone.

Source code

The source code snippets in this article are also on GitHub.