03/26/18

Brief Thoughts and My Remarks from #EAW18

Last week I had the pleasure of attending the National Forum on Ethics and Archiving the Web, held in New York City. The event itself ran from Thursday-Saturday and featured a highly diverse and outstanding lineup of speakers from the worlds of libraries, archives, digital humanities, journalism, media studies, art, ethnic studies and more. I didn’t do the best job of taking my own notes but highlights included Safia Noble’s keynote on her work related to her recent book Algorithms of Oppression.

I primarily wanted to use this post to share the remarks I made as part of my Thursday panel, “Web Archiving as Civic Duty.” I was honored to share the stage with four other people (sadly, Stacy Wood’s travel was delayed and she just missed the panel due to a freak late season snowstorm) to speak about some of my recent work and thoughts on social media and digital preservation for government materials. It’s mostly unedited although I did go off-script a bit. Let me know your thoughts on the phrase “Tweets May Be Archived” and be sure to check out the paper, co-authored by Amelia Acker and myself, that inspired this talk here: https://doi.org/10.1002/pra2.2017.14505401001. Happy reading!


My inspiration for this talk comes from a phrase I’ve thought a lot about for the past year or so: “tweets may be archived.” When President Obama controlled the @POTUS twitter account, the bio included this line “Tweets may be archived.” Now, the @POTUS account specifies “Tweets archived” and links to a White House privacy policy at https://wh.gov/privacy indicating that they “may” collect mentions and other interactions with official government accounts, but provides little in the way of technical details.

I’m here to say that “tweets may be archived” is not good enough when it comes to preserving and providing access to social media data created by public sector organizations. These posts, and the interactions with them by citizens, bots, and other platform users are critical elements for maintaining understandability of these materials over time.

The public needs to understand how platforms like Twitter and Facebook shape the data they make available to all users, not just the federal government. This is perhaps even more urgent following this week’s reports about the ways in which Cambridge Analytica used the Facebook Graph API to mine and utilize vast amounts of user data. Transparency around APIs, interaction data, and preservation is also critically important for the federal government entities that produce federal records on these sites.

Many social media records created by elected officials and US federal government agencies are considered federal records in light of a 2013 NARA Bulletin on Social Media, as well as the Presidential Records Act. As such, these posts, tweets, and updates are required to be maintained over the long-term by the public sector. It is not enough to suggest that we may or may not be able to easily collect interaction data from social media platforms. The design and infrastructure of these platforms resist the types of digital preservation workflows developed around documents, images, scientific data, and other digital objects.

To put this in OAIS terms, if you don’t know what is in your SIP, how can you plan to describe, preserve, and provide access to the information it contains over time?

In some recent research I conducted with my colleague Amelia Acker, we found that the Facebook and Twitter data files from the Obama Administration’s digital transition team were rich and interesting but also confusing, sometimes incomplete, and not easily usable by researchers or other interested users. There was no contextual information corresponding to, say, the addition of new features to the platforms such as automatic retweets on Twitter or the introduction of Life Events on Facebook. For example, President Obama used the Life Event feature to mark the killing of Osama bin Laden soon after they were introduced to Facebook. Without contextual information, there’s no way to know why Life Events suddenly appear on the profile in 2011, or to understand what the Obama administration’s use of this feature suggests about their savviness when it came to social media.

Essentially, the data packages provided to the administration by Facebook, Twitter, and other platforms were the same as they would be for any user seeking to download their data, and we know they do not necessarily treat their users all that well! Who in the audience has downloaded their Facebook or Twitter data?

Furthermore, the interaction data, those retweets, comments and responses, were not included in the data package. Turns out that social media platforms treat federal records like those of any other user which is perhaps minimally acceptable but not good enough for preservation purposes. Instead of mentioning that engagement with federal entities on social media “may” be archived, social media preservation needs more transparency around the archival capabilities of platforms, changes to features and apps, and other metadata which will increase their legibility over time.

How can we hope to ever build systems that provide meaningful comparison across platforms if we don’t even have a good sense of what a profile data file from Facebook or Twitter even contains? Federal social media records document the behavior of the government online, and the engagements to these posts from citizens, people around the world, and bots represent a critical element of these records which provide a fuller picture of platform activity and has value. The digital preservation and curation community cannot rely on the private sector to produce records that are useful outside of the platforms for which they were designed because private companies have limited incentive to act this way. Additional approaches are needed to ensure that public sector information created on private platforms can remain accessible beyond the lifespan of any one platform. Today’s Twitter could be tomorrow’s MySpace, but federal records require thinking on a much longer timescale. “Tweets may be archived” is simply not good enough.

12/19/17

End of Year Update

Earlier this week I looked at this website and realized I had not posted anything on here since January! That’s far too long to go without any updates, so here’s a few highlights of what I’ve been up to at the University of Maryland iSchool this year…

I have continued my research and work at the USDA National Agricultural Library, along with an additional project on the preservation of social media data from the Barack Obama presidential administration. Here are a few of the peer-reviewed papers I authored along with various colleagues that were published during the past few months:

  • Kahn, E., Arbuckle, P., Kriesberg, A. (2017) Challenge Paper: Challenges to Sharing Data and Models for Life Cycle Assessment. Journal of Data and Information Quality. 9(1), https://doi.org/10.1145/3106236
  • Kriesberg, A., Huller, K., Punzalan, R., Parr, C. (2017) An Analysis of Federal Policy on Public Access to Scientific Research Data. Data Science Journal. 16, p.27. DOI: http://doi.org/10.5334/dsj-2017-027
  • Acker, A., & Kriesberg, A. (2017). Tweets may be archived: Civic engagement, digital preservation and Obama white house social media data: Tweets May Be Archived: Civic Engagement, Digital Preservation and Obama White House Social Media Data. Proceedings of the Association for Information Science and Technology, 54(1), 1–9. https://doi.org/10.1002/pra2.2017.14505401001.

I’ve taught a total of three courses in the MLIS program to iSchool Masters students: INST 641: Policy and Ethics in Digital Curation, INST 643: Curation in Cultural Institutions, and INST 647: Management of Electronic Records and Information. I’ve included links to syllabi in the previous sentence to give an idea of the types of topics covered in these courses. My teaching experiences have been mostly very positive- I’ve enjoyed working with the students here in College Park and helping them develop the skills and knowledge to become successful archivists, librarians, and information professionals.

Beyond that, this year I traveled to Barcelona for the 9th RDA Plenary meeting, attended a workshop on the Impact of Digital Repositories, attended AERI in Toronto, and helped facilitate workshops here at the UMD Libraries on Wikipedia and steps to protect Endangered Data. It’s been quite a year!

I’d like to wish all my readers, colleagues, reviewers (even you, reviewer #2), family and friends a happy holiday season and new year. I’ll leave you with a seasonally appropriate GIF from the National Archives and a nod to next year’s Winter Olympics.

via GIPHY

01/26/17

One Librarian, One Reference

Happy New Year! I still get to say that through the month of January. It’s been a while but I’m back to let you, my loyal reader, know that I am going to participate in an exciting event next week, Tuesday 1/31, at McKeldin Library on the UMD campus. We are hosting a Wikipedia Library #1lib1ref event, a mini edit-a-thon of sorts where librarians come together around the world to add references and citations to Wikipedia. This initiative is sponsored by the Wikipedia Library, with the goal of improving Wikipedia through connecting editors with librarians and reference resources.

An owl standing on a book

Longtime readers of this blog will know that I am a big proponent of Wikipedia, having edited and participated in public events in the past. I am very excited to meet fellow Wikipedians at UMD and perhaps convince some folks from the libraries and iSchool to get more involved with editing!

09/12/16

Fall Update: Teaching and Conferencing this Week

Greetings, dear readers. It’s been a while but I have been doing a lot of different things this summer! Now that the semester has started I can share the syllabus for the course I am teaching. It is my first time being 100% in charge of my own class and so far (two weeks in) I am really enjoying it. The course is INST643: Curation in Cultural Institutions. Here is a link to the syllabus. I put in a lot of work designing this course- let me know what you think in the comments!

In unrelated news, I will be travelling to Denver, CO this week for the 8th Plenary Meeting of the Research Data Alliance. This will be my first RDA meeting, and comes after I was awarded an RDA/US Data Share Fellowship this summer. For this fellowship, I am studying the use of controlled vocabularies in agricultural information access systems. I am super excited to see old colleagues and make new ones at this conference. Look for me in the poster hall Thursday, Friday, and Saturday.

07/4/16

Web Archiving #Brexit

Like many of us around the world, I’ve been following the news out of the United Kingdom after the country voted to leave the EU late last month. In the aftermath of the vote, many Britons were shocked to discover that some of the “Leave” campaign’s promises related to the money paid to the E.U. by the United Kingdom were not going to come to fruition. These are documented across the web, but this succinct Boing Boing post highlights the attempts by these politicians to erase their old campaign website from the internet. Thanks to the Internet Archive, it continues its life as a cached copy, documenting the change to the website which removed content relating to increasing funding for the National Health Service, among other social programs. I was struck by the power of web archiving to document political movements as they are represented online, and how they make it more difficult for politicians to eliminate potentially embarrassing content from the internet.

This article reminded me of another excellent example of the power of web archiving, from the New Yorker article “The Cobweb” by Jill Leopore. She explained that the internet archive also preserved a copy of a website maintained by Ukrainian separatists which appears to show that this group was responsible for downing the Malaysia Airlines flight which went down over Ukraine on July 17, 2014. Why was this particular site was crawled by Internet Archive bots? Well, because:

Anatol Shmelev, the curator of the Russia and Eurasia collection at the Hoover Institution, at Stanford, had submitted to the Internet Archive, a nonprofit library in California, a list of Ukrainian and Russian Web sites and blogs that ought to be recorded as part of the archive’s Ukraine Conflict collection.

I recognize that these two events are not particularly related, other than the fact that web archiving figures in our attempts to understand current events and monitor how people represent themselves and their politics online. As more of our collective lives as humans are lived out in digital spaces, resources like the Internet Archive will only become more valuable as a way of piecing the past together. If you haven’t explored the Wayback Machine, give it a shot! I guarantee you’ll find some really interesting/fun/terrible/amazing old websites on there, just punch in a few domains and have fun…

(P.S. Jill Lepore is the best. Her first book, The Name of War: King Philip’s War and the Origins of American Identity was a major inspiration for my senior honors thesis in History. Read it! Or, at least read more of her articles in the New Yorker, they are awesome.)

03/3/16

Amsterdam and IDCC

Last week, I traveled to Amsterdam to attend and present at the International Digital Curation Conference. I wrote a post about the conference here on the Archives Lab site but I wanted to add a more personal touch here. Amsterdam was a beautiful city which I was happy to explore in between conference events.

Being me, I had to find an archive or library to slip into. I ended up popping in at the Staadsarchief, Amsterdam’s City Archives. It was a beautiful building which houses a few exhibition spaces as well as information about the UNESCO World Heritage sites in the area, including the entire city canal ring. The lower exhibition includes some of the city’s founding documents including the charter. It was a real treat!

Staadsarchief, Amsterdam, NL

Staadsarchief, Amsterdam, NL

As always, I was inspired by the conference and excited to attend IDCC again in the future. Thanks to everyone who stopped by my poster. Here’s a picture of it, via Twitter, and a link to it via the conference website.

02/21/16

Upcoming Conference: IDCC 16

I will be presenting a poster entitled “Agricultural Data Curation: Examples from a National Library” at the International Digital Curation Conference this week. This is the first time I will be publicly sharing the work I’ve been doing as part of my Post-doc, and I’m very excited! As you might be able to guess from the title, this poster presents initial results from my work with the Knowledge Services Division at the National Agricultural Library. We highlight the role collaboration plays in the four primary projects currently ongoing at the division.

Are you going to be in Amsterdam for IDCC? Let me know! I look forward to seeing old colleagues and meeting new ones.

01/26/16

Blizzard Movie Night Yields Unexpected Archivist

I am close to digging out from the historic blizzard which has blanketed the Washington DC region with 2 feet (maybe?) of snow. Since Thursday evening, I have spent a lot of time in my apartment and, on a whim, decided to watch the Enough Said starring James Gandolfini and Julia Louis-Dreyfus.  The movie interested me because it was Gandolfini’s last; little did I know the surprise in store as the plot unfolded.

Gandolfini plays Albert, a recent divorcee and DIGITAL ARCHIVIST who works at a place called the “American Library of Cultural History” which houses a significant collection of television films. I’ll avoid spoilers that do not involve archives– Albert oversees digitization and created metadata for archival episodes of television. What’s more, there is a scene in the closed stacks of the library, complete with a stolen kiss amongst the Hollinger boxes! The rest of the movie was great as well and is recommended for archivists, librarians, curators, and everyone else too :-). It was very well-acted and definitely worth a watch.

While doing some post-film googling, I discovered this post from an excellent site called reel-librarians about Enough Said as well. Add it to the blogroll!

12/2/15

Rosa Parks, Historical Memory and Public Space

Yesterday, as I boarded a bus in front of the University of Maryland Student Stamp Student Union (itself named after an influential figure in on campus who served as the “Dean of Women” and increased the presence of women in College Park), I noticed something on the front seat of the bus. At first I thought it was a large check, similar to those ceremonially given to winners of Publishers Clearing House sweepstakes. As I focused on it, I realized what it actually was: a sign acknowledging the actions of Rosa Parks on a bus in Montgomery, Alabama 60 years ago on December 1. She famously refused to give up her seat to a white, male rider of the bus and was arrested. This led to the Montgomery But Boycott, which lasted for more than a year before city buses were desegregated.

Commemorate Rosa Parks Day | December 1

Commemorate Rosa Parks Day | December 1

The sign had a quote, images, and a reminder of the anniversary of Rosa’s action. I loved the simplicity of the action and straightforward way in which the sign forces bus riders to confront a historically significant event. Through the temporary occupation of public space, this sign inserts itself into daily life and brings Rosa Parks into the present day, asking modern commuters like myself to consider the situation in which Rosa found herself in 1955. What would I do if I moved through a world in which a racist status-quo was designed into every aspect of the built environment? As I sat down at an empty seat, I considered the impact of Rosa’s actions and the current state of our country around race relations and justice.

Initially, I did not know who placed the sign but have since figured out that it was the UMD Dept of Transportation Services. Thanks Facebook. The DOTS logo is very small and unobtrusive and the sign is free of other university branding. This aspect of the event also stood out to me. The focus of this sign is on remembering Rosa Parks, not promoting diversity and cultural awareness of the university administration.

Kudos, University of Maryland DOTS. You put signs on the front seat of the university’s bus fleet and got at least one person (myself) thinking about the memory and legacy of the civil right movement. I even went online and read more about Rosa Parks and her longstanding involvement with civil rights organizing and activism in Montgomery leading up to her refusal to move to the back of the bus. If you are interested, here’s a blog post providing more context about the culture of violence and racism in Montomgery before Rosa took action, and here’s a podcast about Claudette Colvin, a teenager who was also arrested for refusing to give up her seat on a Montgomery bus.

Any DOTS bus riders out there in internet-land? Did anyone else see temporary Rosa Parks memorials? I’ll be looking for something similar next year!

11/17/15

Literature on Digital Repository Policy Development

This week, I have been looking into the collections policies and other policy documents of digital repositories, specifically data repositories. The other day, I came across this First Monday (open access!) article, “A balancing act: The ideal and the realistic in developing Dryad’s preservation policy” which I thought was worth summarizing here.

The authors report on their process for developing the preservation policy for Dryad, a general purpose scientific data repository. A Preservation Working Group consulted peer repositories and selected four which directly informed their process. The working group identified work already taking place and considered what a Preservation Policy should contain in developing their final document. In the article, the authors highlight important lessons learned such as the need to maintain realistic expectations and consider the constraints of the technology currently in place at the repository.

I have found few articles reporting on policy development in this way and thought this was a good example to share. Often in digital curation contexts, policy development is an afterthought or individual process, rather than a collaborative effort with diverse inputs. While it can seem trite to go through the process of creating policy rather than “doing the work,” it is vitally important for the vitality of organizations to have meaningful and well-thought-out policies which can inform future practice and help introduce new members into ongoing work. Here’s to publishing articles like this in the future!