Written by Marcus Belcastro for The Science Commons Initiative - Jan 2023
TLDR is at the bottom under 'Summary' For those who want a quick read
The Berkeley Open Infrastructure for Network Computing (BOINC) is a community that has been established for more than two decades now, and in that time we have seen the highs and lows of what hundreds of thousands of everyday computers from around the globe can collectively accomplish. For those who don't know, the BOINC network is a volunteer computing platform where everyday people can use their computers to accomplish real-world scientific work and has been responsible for several advances in mathematics, biology, astronomy, the study of cancer and even the development of treatments for COVID-19.
The BOINC Census is an ongoing initiative that captures the perception and ideas that people have about BOINC in order to help the community and BOINC projects grow. There has only been one other poll of this kind which was conducted in 2006 and this one will be the first of many to come.
This Census is led by The Science Commons Initiative (The SCI) which is a not-for-profit based in the US that aims to reconnect everyday people with science, and one way they are doing this is improving the BOINC ecosystem. This initiative is something that I led within The SCI because I felt like this information is something that the entire community will use going into the future and I hope that these metrics will help guide development decisions and new projects going forward.
The raw data for this poll is available here and was collected using Fillout.
If you would like to be notified when the next Census will be released and its results, click here or go to The Science Commons Initiative website and sign up to their newsletter.
This year, we had 1119 Responses, 939 of which finished the form. From here-on we will be using the results from the completed forms unless stated otherwise.
It's also good to see that The vast majority of respondents have used BOINC before!
import re
import csv
from collections import Counter
import numpy as np
import requests
import matplotlib.pyplot as plt
import seaborn
raw_data_str = requests.get('https://raw.githubusercontent.com/TheSCInitiative/BOINC/main/BOINC_Census/2022/data_source/BOINC_Census_2022_results_raw_cleaned.csv').text
data_str = requests.get('https://raw.githubusercontent.com/TheSCInitiative/BOINC/main/BOINC_Census/2022/data_source/BOINC_Census_2022_results_completed.csv').text
raw_data = [row for row in csv.reader(raw_data_str.split('\n')) if bool(row)]
data = [row for row in csv.reader(data_str.split('\n')) if bool(row)]
headers = data[0]
# Remove headers from data
raw_data = raw_data[1:]
data = data[1:]
# Eye candy settings
BINARY_COLOURS = ['#38CF62', '#E65050']
GENERAL_COLOURS = seaborn.color_palette("colorblind")
BOINC_BLUE = '#163E72'
def get_col(heading: str, data_source: list):
idx = headers.index(heading)
return [row[idx] for row in data_source]
def multi_sel_to_hist(mult_sel_col: list[str]):
'''
Converts a column with CSV values in each cell to a simple list that can
be turned into a histogram.
Fillout does not perform escaping on internal commas so we have to handle it.
Because I always put a space after a comma, we can safely replace it.
'''
return sum([row.replace(', ', ' ').split(',') for row in mult_sel_col], start=[])
print(f'Total Submissions: {len(raw_data)}')
print(f'Completed Submissions: {len(data)}')
fig, ax = plt.subplots(1, 2)
fig.set_figwidth(15)
used_boinc_counter = Counter(get_col('Do you or have you ever used BOINC?', data))
ax[0].pie([len(data), len(raw_data) - len(data)], labels=['Completed', 'Not Completed'], autopct='%.0f%%', colors=BINARY_COLOURS)
ax[1].pie([used_boinc_counter['Yes'], used_boinc_counter['No']], labels=['Yes', 'No'], autopct='%.0f%%', colors=BINARY_COLOURS)
ax[0].title.set_text('Form Completion Rate')
ax[1].title.set_text('Do you or have you ever used BOINC?')
plt.show()
Total Submissions: 1119 Completed Submissions: 939
Let's take a look at what the sample of BOINC users looks like broken down into high-level demographics!
There were a few people who, at the end of the survey, asked why we need to know these things. My response to their concern is that if we know what type of people use BOINC on a daily basis, then we can understand a number of things. We can help developers and project admins know their audience and better tend to their demands, and we can also identify parts of the community where bias might exist. For example, does BOINC appear too technologically intimidating? This might be reflected by the huge portion of highly educated members and maybe low representation of younger audiences.
As people mostly expected provided the poll conducted in 2006, the vast majority of BOINC users identify as male.
Interesting to note that the age groups of BOINC users are decently mixed with 31-50 consisting of the top-two age groups.
I was surprised to know that tertiary graduates make up almost two-thirds of the respondents.
People from the North-America made up the majority of the respondents, Europeans coming second.
fig, ax = plt.subplots(1, 2)
fig.set_figwidth(15)
gender_groups = Counter(get_col('What gender do you identify as?', data)).items()
age_groups = Counter(get_col('What is your age?', data))
age_groups_sorted = sorted(age_groups.items(), key=lambda v: ord(v[0][0]) + 999 * int(v[0].endswith('+'))) # Ensure that 'more than' values are put to the end
ax[0].bar([a[0] for a in gender_groups], [a[1] for a in gender_groups], color=BOINC_BLUE)
ax[1].bar([a[0] for a in age_groups_sorted], [a[1] for a in age_groups_sorted], color=BOINC_BLUE)
ax[0].title.set_text('What gender do you identify as?')
ax[1].title.set_text('What is your age?')
plt.show()
fig, ax = plt.subplots(1, 2)
fig.set_figwidth(15)
edu_groups = Counter(get_col('What is your highest level of education you have completed?', data))
country_groups = Counter(get_col('What country are you from?', data))
# Take the top 5 sources to prevent clutter
country_groups = sorted(country_groups.items(), key=lambda x: x[1], reverse=True)[:5]
# Combine none and other
edu_groups['None or Other'] = edu_groups['No Education'] + edu_groups['Other']
edu_groups.pop('No Education')
edu_groups.pop('Other')
repl_dict = {
'United States of America' : 'US',
'United Kingdom' : 'UK',
'Tertiary Education (postgraduate)' : 'Uni Postgrad',
'Tertiary Education (undergraduate)' : 'Uni Undergrad',
'Secondary Education' : 'Secondary',
'Primary Education' : 'Primary'
}
edu_groups = edu_groups.items()
ax[0].bar([repl_dict.get(a[0], a[0]) for a in edu_groups], [a[1] for a in edu_groups], color=BOINC_BLUE)
ax[1].bar([repl_dict.get(a[0], a[0]) for a in country_groups], [a[1] for a in country_groups], color=BOINC_BLUE)
ax[0].title.set_text('What is your highest level of education?')
ax[1].title.set_text('What country are you from? (top-five)')
plt.show()
Let's get to know how people use BOINC!
Firstly we have computing devices. We find that generally people have 1-5 computers running BOINC and that the use of mobile devices is heavily skewed to no use at all.
Investigating further we show that only 23% of respondents have a mobile device regularly running BOINC.
fig, ax = plt.subplots(1, 3)
fig.set_figwidth(15)
computers = Counter(get_col('How many computers do you have that run BOINC regularly?', data))
computers_sorted = sorted(computers.items(), key=lambda v: ord(v[0][0]) + 999 * int(v[0].endswith('+'))) # Ensure that 'more than' values are put to the end
mobiles = Counter(get_col('How many of the above computers are mobile devices? (mobile phones or tablets)', data))
mobiles_sorted = sorted(mobiles.items(), key=lambda v: ord(v[0][0]) + 999 * int(v[0].endswith('+'))) # Ensure that 'more than' values are put to the end
ax[0].bar([c[0] for c in computers_sorted], [c[1] for c in computers_sorted], color=BOINC_BLUE)
ax[1].bar([c[0] for c in mobiles_sorted], [c[1] for c in mobiles_sorted], color=BOINC_BLUE)
ax[2].pie([len(data) - mobiles['0'], mobiles['0']], labels=['BOINC on mobile', 'No BOINC mobile'], autopct='%.0f%%', colors=BINARY_COLOURS)
ax[0].title.set_text('How many devices do you have running BOINC?')
ax[1].title.set_text('How many are mobile devices?')
ax[2].title.set_text('BOINC on mobile rate')
plt.show()
Another question that was asked is how often people run BOINC.
We see a huge percentage of respondents say that they run it most of the time, 24/7 even whereas about 25% run it sparingly or when their computers are idle.
boinc_often = Counter(get_col('How often do you run BOINC?', data)).items()
repl_dict = {
'A few days of the week' : 'Few days/wk',
'All the time (24/7)' : '24/7',
'Most days of the week' : 'Most days',
'A few hours every week' : '<1day/wk',
"I don't use BOINC" : 'N/A',
"Only when I'm not using the computer" : 'Only Idle'
}
plt.bar([repl_dict.get(a[0], a[0]) for a in boinc_often], [a[1] for a in boinc_often], color=BOINC_BLUE)
plt.title('How often do you run BOINC?')
plt.show()
The one thing that a lot of us want to see with BOINC is that our contribution has a positive impact on the world. So there were a few questions in this census which gathered some basic metrics on it.
Firstly let's talk about enthusiasm in the community. We asked how much impact people think they are having and it seems a little pessimistic according to me. The majority of responses were people thinking they only make a small impact on the world by running BOINC.
Secondly let's talk about contributions to greenhouse gas emissions. We know that BOINC crunching is computationally expensive and computers take up a decent amount of electricity to crunch BOINC tasks, so the community might be concerned over the dominance of particular energy sources of crunchers. We find that most crunchers are unsure or use fossil fuels, but there is a decent amount of crunchers who use renewable sources of electricity. Examining the responses to the 'Other' section, we find a decent amount of people using nuclear power as a source of electricity which is, apart from the process of acquiring the uranium, a relatively clean source of electricity.
By the looks of this, it appears that we have a lot of work to do both on the community side and the emissions side of BOINC.
fig, ax = plt.subplots(1, 2)
fig.set_figwidth(15)
contrib_impact = Counter(get_col('How much impact do you think your BOINC contribution is having?', data))
contrib_impact.pop('') # Empty responses were recorded due to an error in the form. This occurred 13 times.
contrib_impact = contrib_impact.items()
energy_src = Counter(get_col('What source do you get most of your electricity from?', data)).items()
ax[0].bar([a[0] for a in contrib_impact], [a[1] for a in contrib_impact], color=BOINC_BLUE)
ax[1].bar([re.sub(r' \(.*\)', '', a[0]) for a in energy_src], [a[1] for a in energy_src], color=BOINC_BLUE)
ax[0].title.set_text('What impact do you think your contribution is having?')
ax[1].title.set_text('What source do you get most of your electricity from?')
plt.show()
Another angle of impact is how much financial and non-financial impact people are having on the projects themselves. Outside of crunching, volunteers can contribute to BOINC projects in a variety of ways, these of which can be financial (directly donating to a project) or non-financial (donating PC/server hardware to the project or doing some coding to help the project fix a bug or implement a new feature).
I wanted to see if there is a decent difference between people contributing financially vs non-financially because my assumption was that a lot more people would contribute non-financially compared to financially. I was shocked to see that they were about the same!
Next census, I will likely combine this into one large question so that we can easily see total proportions.
fig, ax = plt.subplots(1, 2)
fig.set_figwidth(15)
financially = list(Counter(get_col('Have you ever supported a BOINC project financially?', data)).items())
financially.sort(reverse=True)
non_financially = list(Counter(get_col('Aside from running BOINC, have you ever supported a BOINC project non-financially? (eg, donating computer hardware, volunteering time)', data)).items())
non_financially.sort(reverse=True)
ax[0].pie([a[1] for a in financially], labels=[a[0] for a in financially], autopct='%.0f%%', colors=BINARY_COLOURS)
ax[1].pie([a[1] for a in non_financially], labels=[a[0] for a in non_financially], autopct='%.0f%%', colors=BINARY_COLOURS)
ax[0].title.set_text('Supported a BOINC project financially')
ax[1].title.set_text('Supported a BOINC project non-financially')
plt.show()
In this section, we will see when and where people came to know about BOINC.
In terms of when people learned about BOINC, it seems to be pretty flat apart from the value at 2002. This is likely because we had a lot of respondents who are BOINC veterans and have been around since the beginning. Based on this data, I can definitely foresee in the next census that I'll be changing this part of the form to a numerical input instead of a combo-box.
Then I picked the top 5 places where people first heard about BOINC. It looks like we might also be changing the form here too as a lot of respondents were unsure or chose other responses. Beyond that, the top responses included web searches, in a news article (online or physical) or from a friend.
Investigating the 'other' responses, we find that the majority of responses are linked to SETI@home when they moved from their own app to the BOINC system. This is followed by a decent number of people discovering it from an online forum and a few responses that say they read about it in a journal article. You will also find that there is one person that said they found BOINC was mentioned in a school textbook!
There was also one response which said that they "googled what to do with old PCs" and BOINC showed up. I think that this is something that BOINC can capitalize on in terms of discoverability and SEO.
fig, ax = plt.subplots(1, 2)
fig.set_figwidth(15)
when_boinc = sorted(Counter(get_col('What year did you first hear about BOINC?', data)).items())
where_boinc = Counter(get_col('How did you first hear about BOINC?', data))
# Take the top 5 sources to prevent clutter
where_boinc_top = sorted(where_boinc.items(), key=lambda x: x[1], reverse=True)[:5]
repl_dict = {
'Via a Google search or other search engine' : 'Web search',
'From a news article (both print or digital)' : 'News',
'Word-of-mouth/a friend told me' : 'Friend'
}
ax[0].bar([a[0][2:] for a in when_boinc], [a[1] for a in when_boinc], color=BOINC_BLUE)
ax[1].bar([repl_dict.get(a[0], a[0]) for a in where_boinc_top], [a[1] for a in where_boinc_top], color=BOINC_BLUE)
ax[0].title.set_text('What year did you first hear about BOINC?')
ax[1].title.set_text('How did you first hear about BOINC? (top-five)')
plt.show()
Here are some interesting statistics for project administrators to look at.
Keep in mind that the following two graphs include data that came from multi-select boxes (people can select more than one option)
Represented by the first graph is where people typically look if they have issues or need help. The top 5 results here look pretty obvious, people would prefer to look at project forums, chat groups or do a web search to solve their problem. Interesting to see that the official BOINC site is quite popular in this case.
When a discovery is made, or if something has happened to the project, people want to know about it. So how do people want to be notified?
Notifications via the BOINC manager takes the cake as the top response. Not many projects do this often and people are saying they want to see more of it! Email also appears to be a very valid option too. I know when WCG was still apart of IBM, I signed up to their monthly newsletter and enjoyed reading what the project was doing every now and then. Yet no other BOINC project has done this in my personal experience. There were also a lot of responses for general feedback about the BOINC Census asking for this survey to be advertised on the BOINC manager.
fig, ax = plt.subplots(1, 2)
fig.set_figwidth(15)
where_help = Counter(multi_sel_to_hist(get_col('Where do you seek help for problems you have with running BOINC?', data)))
where_hear = Counter(multi_sel_to_hist(get_col('In what ways would you like to hear about what BOINC projects are doing?', data)))
# Take the top 5 options to prevent clutter
where_help_top = sorted(where_help.items(), key=lambda x: x[1], reverse=True)[:5]
where_hear_top = sorted(where_hear.items(), key=lambda x: x[1], reverse=True)[:5]
repl_dict = {
'BOINC project forums/message boards' : 'Proj. forums',
'Google search or other search engine' : 'Web search',
'The official BOINC forum/message board' : 'BOINC forum',
'The BOINC website' : 'BOINC website',
'Chat groups like Discord or Telegram' : 'Chat groups',
'Via the BOINC manager' : 'BOINC manager',
'Via a web page or dashboard' : 'Web page',
'Email notifications' : 'Email',
'Forum/message board posts' : 'Forum post'
}
ax[0].bar([repl_dict.get(a[0], a[0]) for a in where_help_top], [a[1] for a in where_help_top], color=BOINC_BLUE)
ax[1].bar([repl_dict.get(a[0], a[0]) for a in where_hear_top], [a[1] for a in where_hear_top], color=BOINC_BLUE)
ax[0].title.set_text('Where do you seek help for BOINC? (top-five)')
ax[1].title.set_text('How do you want to hear about BOINC? (top-five)')
plt.show()