Exporting metadata to CSV
Any search results can be exported to a csv file by clicking on “Export Current Search to CSV” at the top of the search results list. Try to make your search as specific as possible before exporting. Particularly consider whether you need to work with both asset and item metadata. Their different metadata profiles mean that your data will have more columns if you include both in the export. This may cause unnecessary post-export cleanup work. If you do need to work with both, it is likely worthwhile to export them separately.
To download the file, click the link that appears at the top of the page. If it has disappeared, click on “CSV Exports” under the “Manage'' dropdown on the navigation bar at the top of the page. This will take you to a page listing all of the exports you have recently done. Exports are logged in Hyacinth so you can download them again later if you need to (very helpful if you accidentally overwrite data; the export is a snapshot of the records before changes were made).
users can only see the csv exports they have created themselves
The csv file will contain all data from the records in the search you exported. Depending on the records in the search, this can be a very large file of hundreds of columns and thousands of rows.
Processing metadata for remediation and re-import
Most operations will benefit from editing the file to make it easier to work with. If the file is so large that spreadsheet programs cannot handle it, this abridgment is best done with Open Refine or Python. If the file is not too large, you can open it in a spreadsheet program and delete unneeded columns. Note the guidelines on spreadsheet editing programs - DO NOT OPEN THE FILE IN EXCEL. For collaborative work, opening the file in Google Sheets is recommended.
Never remove the _pid column from a spreadsheet. This is a unique identifier that tells Hyacinth which records you are editing.
Hyacinth headers are explained on this page.
The order of the headers on the spreadsheet does not matter. You can reorder them to suit your work.
When preparing a spreadsheet for Hyacinth ingest, bear in mind that fields will generally not be overwritten if they are not present in the import spreadsheet. It is good practice to omit columns that are not being changed. This minimizes the chances of introducing errors in the process of remediation.
There is one major exception to this assumption. Make sure you understand the implications of editing field blocks as explained below. Not keeping field blocks intact is the easiest way to accidentally overwrite data during remediation.
Recognizing field blocks in CSV exports
If a field contains a colon (:), that means that the field is a part of a field block. If you update one of the fields in the block, you must include all the fields in the block in the import spreadsheet or Hyacinth will assume the other fields in the block have null value. This can cause you to lose data as empty columns overwrite existing data. For example, here is a field block of parent publication fields. You can tell they are in a block because they have the same prefix ("parent_publication-1:").
- parent_publication-1:parent_publication_doi
- parent_publication-1:parent_publication_issue
- parent_publication-1:parent_publication_page_end
- parent_publication-1:parent_publication_page_start
- parent_publication-1:parent_publication_title-1:parent_publication_title_sort_portion
- parent_publication-1:parent_publication_volume
Field blocks are also recognizable in the Hyacinth web UI by their presentation:
If you would like to update information in one of these fields using a spreadsheet created by csv export, DO NOT delete the other columns that are a part of your block. Leave the existing data in the other columns and include all the columns in your import spreadsheet.
Questions?
If you are uncertain about anything, it is much better to ask than to plow ahead. Send questions to hyacinth-support@library.columbia.edu. Don't worry whether the question is big or small. Someone will gladly help you.