Cortext Manager Documentation

Search
Skip to content
  • Go to Cortext Manager
  • Visit our Q&A Forum

Adding metadata to pdf corpus

Cortext Manager Q&A forum › Category: Data processing › Adding metadata to pdf corpus
0 Vote Up Vote Down
Daniel Bach asked 4 years ago

Dear CorText team
Thank you for all the great work on your tool!
I have a problem indexing my metadata.
I have uploaded a corpus of PDF files, done terms extraction and network mapping with it.
I have also cleaned some terms from the terms extraction and successfully indexed the new termslist 
My problem is that i now have a Google Sheets file with metadata that i wish to add to my database so that i can project it onto my network map
I cannot figure out how to do this with the Corpus List Indexer script. I download my csv with tabulated seperators, but i cannot get the metadata to display on my network visualization. 
Link to the spreadsheet file i am attempting to index
https://docs.google.com/spreadsheets/d/1sIuq11PT7ouw7Kd8If8VWCWv74H-1MpckUct2q__esM/edit?usp=sharing
I hope you can provide a step-by-step guide of how to do this?
 
All the best
 
Daniel Bach

Question Tags: adding metadata to PDF corpus, Corpus List indexation
Lionel Staff replied 4 years ago

See below!

2 Answers
1 Vote Up Vote Down
Lionel Staff answered 4 years ago

Dear Daniel,

What you are describing looks good. From what I have understood you are close to apply your metadata on your dataset!
You can follow the first part of the video tutorial below, to apply meta data to a collection of pdf, txt (…) files, you only have to use the file names as a variable to link the two information (named filename in your dataset).

In addition, if you do not want to borrow you with temporal information, as you already have a year information in your meta data, please rename the column as: “ISIpubdate”. After the meta data indexation, CorText Manager will be able to directly use this information in all scripts where a temporal information could be used.

This video demonstrates how to build a corpus from txt files, enrich it with proper time steps and use distant reading script:

I hope it helps!
Lionel

Daniel Bach replied 4 years ago

Dear Lionel

Thank you for your response!
I have now tried following the steps in the video, but the output that I get from the Corpus List Indexer script has no list object, neither am I able to project metadata on my co-occurence map.

I see that in the video there is an option to parse the CSV that is oploaded with the metadata. There seem to have been an interface change since the video was made? When I click “opload file” there is no parsing step prompted and if i try and use the data parser script I can only parse the original zip file containing my PDFs, not the new CSV file.

I hope you will be able to help me with this problem

All the best

Daniel Bach

0 Vote Up Vote Down
Lionel Staff answered 4 years ago

Dear Daniel 
Yes, exactly: we have a new upload button / process! We hope it is simpler this way.

  1. Click on the “upload a file” button
  2. Drag and drop your corpus / dataset / documents / zipped file
  3. Waiting until the end of the upload process
  4. When it ends, the drag and drop section may become green: right click on it, and you will directly go the parsing step.

In any case, at any time, you can go the script list (start script > Corpus > Data parsing) and find the parsing script to parse your zipped file independently to the uploading step.
I hope it helps!
L

Lionel Staff replied 4 years ago

This question is related to that one: https://docs.cortext.net/question/terms-list-in-type-of-data-missing/
And the process has been now documented here: https://docs.cortext.net/upload-a-resource/#upload-process

learn about CorText methods and share your experience

  • Introduction
  • Manage projects
  • Data Processing
    • Data formats
    • Upload corpus
    • Data Parsing
    • Upload resource
    • Query Corpus
    • Data Curation
    • Data Slicer
  • Data Exploration
    • Corpus Explorer
    • Demography
    • Distant Reading
    • W2V Explorer
    • Contrast Analysis
  • Time Processing
    • Period Slicer
    • Period Detector
    • Epic Epoch
  • Text Processing
    • Terms Extraction
    • Terms Indexer
    • List Builder
    • List Indexer
    • Named Entity Recognizer
    • Sentiment Analysis
    • Csv Editor
  • Spatial Processing
    • Geocoding addresses
    • GeoEdit
    • Geospatial exploration

  • Network Mapping
    • Node selection
    • Edges definition
    • Edges: metrics definitions
    • Dynamical Settings
    • Network Analysis & Layout
    • Examples and tutorial
  • SASHIMI
  • Contingency Matrix
  • Profiling
  • Structural Analysis
  • Topic Modeling
  • Correspondance Analysis

  • Tools
    • csv editor
    • Label editor
    • GeoEdit
  • Cite CorTexT Manager

  • Video Tutorials
  • Training materials
  • Visit our Q&A forum
  • Go to CorText Manager

Search documentation

Visit our Q&A Forum | Ask a question!

Visite CorText team website
  • Sign in
  • New account

Forgot your password?

Lost your password? Please enter your email address. You will receive mail with link to set new password.

Back to login