skip to main |
skip to sidebar
No, Carbon-14 dating is not another way to meet your true love, unless you are looking to date a fossil. I have dated some fossils in my life, but that story is for a different blog. Seriously, C-14 dating is a technique used to determine the age of an artifact, which could be a pot sherd, a bone, wood, cloth and even plants. It is used a lot in archaeological and anthropological research and as such C-14 measurements can be recorded and kept as data.
Ok, what is C-14?
What happens is very interesting. See: http://science.howstuffworks.com/environmental/earth/geology/carbon-141.htm
Actually, it sounds like it is more of a demolition derby to me.
First, the sun sends out cosmic rays which sounds very science fiction to me, but
it happens all the time; in fact I read that every person gets hit by
about half a million cosmic rays every hour. No wonder I feel beat all
the time! These cosmic rays not only hit people, they hit atoms which
creates new cosmic rays called atomic neutrons. These atomic neutrons also
begin bombarding everything in their way including banging into
nitrogen atoms. This creates carbon-14 and
carbon-12 atoms. Personally, I think the cosmic rays, neutrons and atoms need
a time out for all this colliding and hitting.
Anyway,
the key thing is that all this anti-social behavior results in the C-14
atoms being radioactive, and as well all know, there is a half-life to
anything radioactive. In this case, it turns out that C-14 has a
half-life of 5700 years. But we are not finished yet. When carbon-14
interacts with oxygen on planet earth, it turns into carbon dioxide,
which the plants soak up and then we humans and our animal friends eat.
Yes, even if you are a vegetarian you are getting carbon dioxide! And
you are getting a certain amount of radioactive C-14 and some C-12, and
apparently the amount we have in our system is pretty much constant.
Until we die.
Once
we die the amount of C-14 in our bodies starts to lessen, but the C-12 amount stays the same.
Scientists have been measuring the carbon in plants, animals and organic
things so they know about how much is present in for example, a living tree. And using some fancy equations which I will not go into here,
it is possible to measure how much C-14 is left in a biological or
geological artifact as compared to how much C-12 is present. Scientists
can do this for things that are up to 60,000 years old. Isn't that
cool! This technique was discovered by a guy named Willard Libby in
1949.
OK, so how does this turn into a data file?
So, the data that is gathered in C-14 dating is the number of years or age of an artifact. It can be recorded as something like 10,000 years, 36,000 years, 700 years, and so on. And this is good, but most researchers also record other details like where the artifact came from, a description of the artifact, and who gathered the artifact and with what instruments. This could include the date the artifact was found; geographic details such as latitude and longitude, depth, what was found next to or around the artifact; or the name of the researcher or project; equipment used; and maybe even a text description or abstract. All of these items can be recorded as fields in a spreadsheet. I am still researching this but it seems that the kind and amount of metadata recorded varies depending on what is being dated.
At tDAR a search reveals some datasets based on C-14 dating. Within tDAR the metadata is extensive and includes ways to identify and describe each column. There are details on the data type (text, numeric, etc.), the type of value, the category the measurement would be identified by, and if created, the ontology used by the investigator to organize the details. Depositors also record items such as site name, type of site, the anthropological or archeological culture (i.e., late archaic), the material being measured (i.e., fauna), the method of collection or investigation type (i.e., excavation). And there are some generic items such as a record number, a DOI and resource language.
So, what does this mean for long term data management?
Although there is a lot of effort required for finding artifacts and dating them, the results can be organized into a spreadsheet and described in such a way that others can use and re-use the material. tDAR has a useful metadata structure, but one could also use other XML-based metadata schemas. I am not sure if these data files will be around for the next 60,000 years, but with proper management they could be around for some time to come.
As I mentioned in an earlier post I have found an archive called
tDAR; http://www.tdar.org/ and the
services they offer are compelling. Like most of the newer archival
operations their approach is to automate as many processes as possible. This means that they do check sums and fixity
checks for the files under their care. There is some version control and there are metadata fields to track provenance. Much of the information is entered by the depositor. The rest is automated by their "workflow engine" and through this they can track, process, and manage files when they are deposited and over time.
But
here's the thing: In my situation, we have data we got a long time ago,
up to 35-40 years ago. Initially, I worked with punch cards and these
have been converted or migrated over time to new storage media, and to
ensure that data sets can still be used when there are changes in
operating systems. This has sometimes required us to write little
programs or do other actions to for example, move files from EBCDIC to
ASCII, or to enable use from main frame OS to DOS to Windows. We have also
had to be sure that where we had system files produced by SPSS, SAS,
STATA, etc., or data plus a set up file (now our preferred archival
mode) that these were still usable in newer versions of statistical
software. I kept and still keep paper files (yes, slowly being
converted to PDF) on what all we did with each file.
We now use
Data Documentation Initiative metadata fields in DDI Codebook (Section 1.0, Document description and Section
2.0 Study Description, as appropriate) to keep information about the
software version(s). (Ex. when we first started monkeying around with
DDI we used a very odd free-ware editor that we realized later did not
easily move into other better editors, so we record the editor info in
section 1.0 now, and we record the stat package versions in the 2.0
sections) It takes effort to find this out when a file is being
evaluated. Sometimes it is in the header info and sometimes not. You
have to dig to find it. And it is worth it because we want to be sure we
are documenting versions and clearly delineating provenance. We think
this will matter to future researchers and archivists.
The info on versions and what we have
done to migrate is also hand recorded in our SQL database of holdings
and we can run a little report from time to time to see if anything is
getting kind of old. We keep track of websites with little hacks, etc.
to make these stat package conversions and also for other formats (ex.
early versions of PDF did not easily convert to PDF-A) The process is
tedious and human labor intensive. We have had lots of help from our
statistical consultants and they know a number of helpful tricks for
making data usable again. It takes a lot of time to convert older file
formats and we have a huge backlog.
So, when I am looking at
the newer repository operations I am looking for where this kind of work
gets done and how and to what extent. I know that different file
formats require different amount of loving care and attention; for
example, tDAR accepts formats that are either open standards formats
like CSV, XSLX, TIFF which are
international standards that are not patent protected and can be
implemented via publicly available specs. Or they accept industry
standards like MDB files or DOC files which are not "open" but widely
used and their system can convert the files when the need arises. And
they say "We do some conversion of files, but this is to provide more accessible
formats, but not to archival formats. Why not? All of the formats we
accept are either open standards or so commonly used that we don't see
the need."
My experience is that truly managing the data
files we handle at UCLA requires a
lot of hands on care and for that reason we don't take much new stuff
unless it meets our requirements for documentation and format. Also we
are too small of a shop to handle a lot of data. We have a list of what we look for in an initial assessment and if the depositor doesn't
provide us with enough we don't ingest it.
In some automated repository
systems there are options for individuals
to upload materials and put in some details and voila, they can say it's
archived. Having had the experiences I have had with statistical
software, operating systems and media storage formats, this does not
sound like enough and therefore I can't say to what extent the materials
managed by some of the better known repository systems are truly being
preserved. I am told that there is no repository system anywhere that
addresses these issues.
Are these issues not being addressed
because we have not figured out how to do so? Or are my concerns no
longer relevant in today's technology environment? Or is it really that
these checking processes are hard to automate and without automation it
is too time consuming, labor intensive and expensive? If the answer to
this last question is yes, then what about coming up with strategies to
address this? I have always felt that it is better to manage a
smaller collection of well documented materials than to try to take
everything. I'd rather feel that at least a few things are being
preserved for someone to use in 40-50 years rather than have a lot of
stuff that nobody can use.
Archaeology 2.0 provides a great overview of issues and concerns in managing data created, gathered and collected in research.
I'll just say it: I really like what tDAR is doing. From their website they say "tDAR is an international digital archive and repository that houses data
about archaeological investigations, research, resources, and
scholarship." Their work represents an international effort including the U.K. based Archaeology Data Service and Digital Antiquity in the U.S. Other players are University of Arkansas, Arizona State University, Pennsylvania State University, Washington State University, SRI Foundation and the University of York. tDAR has a sustainable model for ongoing operation AND for preserving deposited materials into the future. This is the first repository I have encountered where there is substantial consideration of how changes in technology and in software can affect the future usability of data. Their approach is to limit the data formats accepted and to use some automated processes to check and migrate files when needed.
And they will work with a wide variety of materials even with these limits:
I have also been pleased that the tDAR staff have been so helpful and responsive as I ask my questions. As I proceed with the project for the Cotsen Institute I think tDAR is one of the leading candidates for repository choice.