Thursday 19 May 2011

Data Management Planning and Practices for Research Centres and Programmes Seminar

The Data Management Planning for ESRC Research Data-rich Investments project (DMP-ESRC) ended in May 2011. To conclude the project, the UK Data Archive Research Data Management Support Service hosted a Data Management Planning and Practices for Research Centres and Programmes Seminar at the Royal Statistical Society in London. We invited a range of speakers in different roles from research centres and hubs that cooperated with the project, as well as an audience of researchers, managers, and UK funding councils.

Thank you to all that attended and especially those that came to speak. In sharing their experiences and insights they provided an understanding of the issues faced and resource needs of large research hubs. Here’s a summary of their presentations.


The day began with the JISC Managing Research Data programme manager, Simon Hodson. Simon outlined the context in which DMP-ESRC operates, a context of an international drive towards good data management. Technology is driving wide scale data generation, and funding bodies internationally are acknowledging not only this phenomenon but the benefits of managing it effectively. These benefits are efficiency, methodological and research innovations, and improving research integrity.

Whilst much good work has been done in the last decade at a generic level, there is a lack of discipline focus materials, effective infrastructure and support for and training for researchers. The JISCMRD programme aims to remedy these deficiencies, and the DMP-ESRC project is one project in wider strands.


Veerle Van den Enyden of the UK Data Archive presented the arguments for data management planning. Planning and preparing data is what has always driven research, but increasingly there is a requirement to share data. Consequently data needs to be collected to high quality standards to ensure it is sustainable over time and understandable to future researchers.

Veerle highlighted the experiences of RELU, which was a pioneer in data management planning; the Timescapes qualitative longitudinal archive; and the Economic and Social Data Service which not only acts as a service providing data for reuse, but also supporting researchers in data management.

The messages from these examples are that implementation is critical – planning does not mean good data management will happen. Embedding good data management practice in the research process is also cost effective. Recognising problems or addressing limits to data sharing through consent, ownership, or anonymisation early and addressing them early eases the pressures at the end of a project to prepare data for archiving. One final lesson, good practice already exists but encouraging researchers to share practice – through whatever avenues - is invaluable.


Sue Venn, Research Officer on the SomnIA project that was part of the New Dynamics of Ageing spoke about her responsibility for data management. SomnIA is an example of modern collaborative research challenges.

It was a large project: multi-disciplinary, multi-institutional. It drew upon external partners, cross council research funding, and contained different work packages working on different timescales. These workpackages were occasional interlinked but also separate. The project generated data in both standard (focus groups) but also highly specialised forms, it also reused data.

A requirement from an ethical review board that data be encrypted forced the project to begin addressing data management issues early, specifically encryption. After consultation with IT support, which in retrospect was critical, the project adopted SharePoint, then known as Groove, as a platform. Critical as the project was able to source a platform that was suitable for its needs and concerns. Teams were able to work inside collaborative spaces which facilitated controlled levels of access to data as well as project wide access to documentation and outputs.

The experience of SomnIA highlighted the value of collaborative software: it provided a central, up-to-date and satisfactorily secure location. Access is controlled through individual accounts, not machines which facilitated home and office access. Critically, version control eliminates the need and risk endangered by emailing draft copies back and forth.

Critical lessons from the SomnIA experience are that all team members need to be engaged with the software for it to work. This was ensured by offering no alternative approach but using the platform. Likewise, reinforcing usage was encouraged by suspending inactive members or members who don’t log in to SharePoint.

However, there are issues projects should be aware about. First, there needs to be dedicated time factored in for training, both initial and ongoing. People have different levels of experience with platforms, so training can take time. Second, training is ongoing in as people leave and join the project. Third, someone, at least one, needs to be that trainer as well as responsible for maintenance of the platform. Finally, with a long-term project, versions, updates, and compatibility can become issues – with the software itself and with different levels of software at different institutions.


Catherine Butt of the Third Sector Research Centre gave the seminar a view from role of centre manager. Conclusions drawn on her experience are to include data management in any funding application or project design, not as an add-on – and make sure adequate resources are built into the budget. Make it a priority addressed early. It is critical data management is understood by all staff, not just researchers, which involves a degree of coercion and a commitment to training. However, barriers to access must be low.

Cathy outlined the data management structure of the TSRC. As Centre Manager, Cathy formally reports on data management issues every six months. The centre has established a data management committee that includes research from quantitative and qualitative approaches, organisational staff, and a centre co-director.

One final message from Cathy’s presentation was that the role of support staff is much more about maintaining a relationship. It’s always being negotiated and renegotiated through discussion and team meetings depending on the issues that are being faced. Support staff eases an understanding of what both parties need to do, not jumping straight to a point.


Peter Robbins, Associate Director of Innogen spoke about what centre managers need. Specifically, the need to facilitate naturally occurring data management by researchers and enhance and focus this into embedded data management within centres.

The Centre Director’s role focuses on data management needs – establishing a set of procedures sufficiently flexible to allow for different disciplines and types of data; and data sharing needs – knowing how data is to be shared to protect information and publication needs.

Directors need to work with archives, funders and researchers to develop flexible and appropriate polices that can be integrated into standard data collection and not be a significant additional burden.


Kristine Doronenkova from the ESRC surmised the funding council’s role in data management. The ESRC are committed to data sharing through services like ESDS, The ESRC are trying to generate a data management culture through this and the new data management planning requirement. These plans will be reviewed as an indivisible part of the application, and enforce implementation through annual reporting and preservation through ESDS.

The ESRC want data management planning to be kept simple and helpful for researchers. It includes a requirement to assess existing data sources; provide information on new data to be created; assure its quality, documentation and metadata; and detail back-up and security measures. Importantly it requires researchers to address perceived difficulties up front, including data ownership issues, outlining data management responsibilities, and steps required to offer data for archiving.


The final session of the day involved groups discussing the question: what three things would help you better manage and look after your data? Discussion generated a range of support ideas from specific to the general. Suggestions include legal advice on Data Protection and Freedom of Information Acts to building a network of data management support. This includes personal support from an experts list, contacts that can be consulted within and across centres and communities for reliable sources of advice and support in research – for example, a database of reliable transcribers. Also, a desire to get researchers engaged with data managers, changing the incentives to do data management through significant financial or career advancement. Finally, there was an expression of acknowledgement for being asked by funders to do more for less.

Wednesday 6 October 2010

Experiences with SharePoint/Groove?

One of the issues arising from this project is the practicalities of working either off-site or across institutions, namely how can researchers facilitate effective control of data and data collections when they are based in different locations or working away from their institution. One potential solution that has been suggested in the course of our data management interviews is Microsoft SharePoint 2010/Groove. I am interested in following up on this. So, does anyone have experience of working with SharePoint 2010/Groove? I'm interested in hearing about it's performance from a data management perspective - particularly in terms of shared workspaces, remote access, version control, and data security. Does it work? Does it work well? Is it easy to use and adapt to? What doesn't work well, what are the bugs? What could it do better? I'd be grateful for any insights and testimonials.

Friday 27 August 2010

Last question...social science?

Social science
"the study of society and the manner in which people behave and impact on the world around us and includes disciplines such as economics, law, sociology, psychology, business studies, education, politics and international studies."

UK Strategy for Data Resources for Social and Economic Research 2009-2012, p.8

Thursday 26 August 2010

And a data collection is...?

Data collection
A data collection is typically comprised of three components: data, documentation and metadata. Occasionally, a fourth component of code exists. Data collections are typically organised by reference to a particular survey or research topic and cover a specific geographic area and time period.

UK Data Archive (2010), UK Data Archive Preservation Policy, pp.14-15

Wednesday 25 August 2010

Well, what's documentation then?

Documentation
Documentation is that portion of a data collection that is required in order to re-use data. It commonly covers the subjects of sampling design, methods of data collection, questionnaire/interview design, structure of the data files, lists of variables and coding schemes, details of weighting, confidentiality and anonymisation, and provenance of any secondary data used. It also includes licence arrangements and all materials obtained through the original negotiation and data deposit, as well as post-deposit information created during preservation and ingest activities. The terms metadata and documentation are often used interchangeably and there is overlap between the two, though documentation tends to have a structure that is specific to each data collection.

UK Data Archive (2010), UK Data Archive Preservation Policy, pp.14-15

Tuesday 24 August 2010

Ok, so data mangagement...fine, but what is/are data?

Data
Data are all the material, regardless of format, which are intended to be analysed. As part of datasets, they are the primary element of a data collection. More precise definitions of data vary according to context. Quantitative data may refer to just the matrices of numbers or words that comprise a data file, but may also cover other information (metadata) held within a statistical package data file, such as variable labels, code labels and missing value definitions. Qualitative data might include interview transcripts as well as audio and video recordings (analogue or digital).

UK Data Archive (2010), UK Data Archive Preservation Policy, pp.14-15

Monday 23 August 2010

Progress report

This blog was intended as an experiement. The problem I've found in maintaining it was that it was difficult to be informative about the progress of the project and the challenges and problems we were encountering, and maintain a level of confidentiality as to who and where we were encountering these challenges and problems. Twitter takes care of the informative aspect, while this blog was seeming more like a commentary on the standard of spreads and hospitality provided by centres and programmes (which by the way has been excellent).

 
However, we are around mid-point. Last week our progress report was approved and arising from it we reported three main themes emerging from project in terms of outputs and training.
  • Data ownership. A lack of awareness about who owns primary data, and a lack of consideration about the implications of using secondary data in terms of licences and copyright.
  • A need to devise strategies and tools for working across institutions.
  • Difficulty in getting good data from centres and programmes on data management costs.
For a richer explination of these themes we have produced a report on current data management practices in the social sciences

 
Our next challenge is to devise centre specific strategies to address these themes, but stratagies that can also have a generic application for social science data investments.