ELIXIR – sustainable infrastructure for bio info in Europe

Janet Thornton
Disclaimer: My notes. Fairly incoherent and probably not accurate πŸ˜‰

ELIXIR is a European effort to co-ordinate the infrastructure. Preparatory project – 10-20yr roadmap for infrastructure devel to support research.

EU 32 partners, 13 member states. 4.5mill E funding to define scope, cost of infrastructure.

Goals:
Co-ordinated data resources, integration & interoperability of data, links to data in other domains, open access to data, enhance euro competitiveness in bioscience, address need for increased funding adn its co-ord.

v young science. Funding streams for infrasxr not in place.

stakeholders: users, experimentalists (data provision), resource providers (core & specialist), tool providers (bioinformaticians), funders – govt bodies, EMBL, EU charities, Industry.

Challenges prompting ELIXIR: data growth, global context, large and distributed userbase, preservation & accessibility of data, impact on biosciences, growth of funding

Cost of maintaining data is insignificant compared to cost of data generation. Makes sense to fund.
Integration increasingly important as academic, molecular type data is increasingly needed by medicine, agriculture etc

ESFRI: Biology research infrastructure proposals. ELIXIR will support these.

Reports from initial committee meetings (userbase consultation) due now – will define the scope & remit of ELIXIR. Then work on international agreement for goals, costs then look at how to fund.

Can’t keep everything centralized – need more distrubution. Hub at EBI and nodes in diff member states

Will provide: core and specialist data resources, compute centres, infrastructure for tools and services integration, support for Bio ESFRI projects, community support and training.

DB survey. 170 DBs across EU. Many of the core DBs are at the EBI, but are distributed in the sense that data providers are across Europe. Also many specialist resources across EU. All of these use the core resources as reference data. DB sizes follow power law – most <10GB but a few are huge. All have web browser queries. Some still have email query. about 70% have data downloads and about 30% have programmatic access. 39/170 have some restrictions on data access (legal or practical). A fairly high proportion have no funding. Most of them cost < mill euros. About 40 mill euros a year being spent at the moment on these DBs. Total invest to date is 308 mill euros. 90% have less than 3 year funding security. Most have less than 50K unique users /month, but a few have many more. Most have <5 staff, a few have many. Many don’t have any members of staff. See Poster E41. for details.

So – ELIXIR needs to co-ordinate, prioritise and stabilise funding for these resources.

Databases relatively under control compared to other aspects: Standards and ontologies, Literature, Other domains (medical data, biodiversity data etc), Integration

Don’t need to centralise standards devel, is fine for them to come out of communities, but do need to encourage and publicise standards. OBO.

Lit: integrated, open access text-based lit resource would be nice πŸ™‚

compute resources…? Other domains deal with much bigger scale data (CERN), but they have fewer users and bioinfo data is growing at an exponential rate. Can’t chuck NGS data around the web. So – what do we need to keep? Should it be centralised? Probably need biodata grid like CERN (only more complex).
Modularise organisation of dataresources. Build network of biocomp resources. Catalyst devel of web services and cloud computing. More program to data rather than other way round. Work with EU supercomputing centres.

User priorities: integration, format compatibiltiy, website usability.
Short term: acecss – programmatic, web-site, web service, downloads. Develop well-maintained catalogue.
Long term – integration of data and tools. Encourage commercial tools to adopt open standards

Co-ord training.

Comments:

DB developers should have to abide by standards in order to publish / be funded by ELIXIR
Global context is important and ELIXIR will take into account international collab models in funding approaches. Data sharing will be required.
ELIXIR not about providing national infrastructure – this should come from per-country funding. Only interested in pan-EU infrastructure. Prob national nodes would be well set up to also provide pan-EU function though (shared compute etc).

May call for proposals for nodes from EU countries, although no actual funding and no mechanism for deciding which would be accepted yet so proposals a bit hypothetical.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s