Skip to content
This repository has been archived by the owner on Feb 12, 2019. It is now read-only.

Commit

Permalink
changed item retrieval method & added skipped collections
Browse files Browse the repository at this point in the history
  • Loading branch information
ehanson8 committed Dec 17, 2018
1 parent e0621f0 commit ff5a48a
Show file tree
Hide file tree
Showing 27 changed files with 416 additions and 344 deletions.
19 changes: 1 addition & 18 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,29 +10,12 @@ All of these scripts require a secrets.py file in the same directory that must c
filePath = '/Users/dspace_user/dspace-data-collection/data/'
handlePrefix = 'http://dspace.myuni.edu/handle/'
verify = True or False (no quotes). Use False if using an SSH tunnel to connect to the DSpace API
skippedCollections = A list of the 'uuid' of any collections that you wish the script to skip. (e.g. ['45794375-6640-4efe-848e-082e60bae375'])
```
The 'filePath' is directory into which output files will be written and 'handlePrefix' may or may not vary from your DSpace URL depending on your configuration. This secrets.py file will be ignored according to the repository's .gitignore file so that DSpace login details will not be inadvertently exposed through GitHub.

If you are using both a development server and a production server, you can create a separate secrets.py file with a different name (e.g. secretsProd.py) and containing the production server information. When running each of these scripts, you will be prompted to enter the file name (e.g 'secretsProd' without '.py') of an alternate secrets file. If you skip the prompt or incorrectly type the file name, the scripts will default to the information in the secrets.py file. This ensures that you will only edit the production server if you really intend to.

**Note**: All of these scripts skip collection '45794375-6640-4efe-848e-082e60bae375' for local reasons. To change this, edit the following portion of the script (typically between line 27-39)


Skips collection 45794375-6640-4efe-848e-082e60bae375:

for j in range (0, len (collections)):
collectionID = collections[j]['uuid']
if collectionID != '45794375-6640-4efe-848e-082e60bae375':
offset = 0


No collections skipped:

for j in range (0, len (collections)):
collectionID = collections[j]['uuid']
if collectionID != 0:
offset = 0

#### [addKeyValuePairOnHandleCSV.py](addKeyValuePairOnHandleCSV.py)
Based on user input, adds key-value pairs from a specified CSV file of DSpace item handles and the value to be added to that item using the specified key. A CSV log is written with all of the changes made and a 'dc.description.provenance' note describing the change is added to the metadata of each item that is updated.

Expand Down
1 change: 1 addition & 0 deletions addKeyValuePairOnHandleCSV.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@
password = secrets.password
filePath = secrets.filePath
verify = secrets.verify
skippedCollections = secrets.skippedCollections

startTime = time.time()
data = {'email':email,'password':password}
Expand Down
1 change: 1 addition & 0 deletions addKeyValuePairToCollection.py
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@
password = secrets.password
filePath = secrets.filePath
verify = secrets.verify
skippedCollections = secrets.skippedCollections

startTime = time.time()
data = {'email':email,'password':password}
Expand Down
3 changes: 2 additions & 1 deletion addKeyValuePairToCommunity.py
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@
password = secrets.password
filePath = secrets.filePath
verify = secrets.verify
skippedCollections = secrets.skippedCollections

startTime = time.time()
data = {'email':email,'password':password}
Expand All @@ -67,7 +68,7 @@
collections = requests.get(baseURL+'/rest/communities/'+str(communityID)+'/collections', headers=header, cookies=cookies, verify=verify).json()
for j in range (0, len (collections)):
collectionID = collections[j]['uuid']
if collectionID != '45794375-6640-4efe-848e-082e60bae375':
if collectionID not in skippedCollections:
offset = 0
items = ''
while items != []:
Expand Down
1 change: 1 addition & 0 deletions addNewItemsToCollection.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@
password = secrets.password
filePath = secrets.filePath
verify = secrets.verify
skippedCollections = secrets.skippedCollections

startTime = time.time()

Expand Down
1 change: 1 addition & 0 deletions deleteBitstreamsFromItem.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@
password = secrets.password
filePath = secrets.filePath
verify = secrets.verify
skippedCollections = secrets.skippedCollections

itemHandle = raw_input('Enter item handle: ')

Expand Down
56 changes: 31 additions & 25 deletions deleteKeyFromCollection.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
print 'Editing Stage'
else:
print 'Editing Stage'

parser = argparse.ArgumentParser()
parser.add_argument('-k', '--deletedKey', help='the key to be deleted. optional - if not provided, the script will ask for input')
parser.add_argument('-i', '--handle', help='handle of the collection to retreive. optional - if not provided, the script will ask for input')
Expand All @@ -39,6 +39,7 @@
password = secrets.password
filePath = secrets.filePath
verify = secrets.verify
skippedCollections = secrets.skippedCollections

startTime = time.time()
data = {'email':email,'password':password}
Expand All @@ -61,6 +62,7 @@
offset = 0
recordsEdited = 0
items = ''
itemLinks = []
while items != []:
endpoint = baseURL+'/rest/filtered-items?query_field[]='+deletedKey+'&query_op[]=exists&query_val[]='+collSels+'&limit=200&offset='+str(offset)
print endpoint
Expand All @@ -69,32 +71,36 @@
for item in items:
itemMetadataProcessed = []
itemLink = item['link']
print itemLink
metadata = requests.get(baseURL + itemLink + '/metadata', headers=header, cookies=cookies, verify=verify).json()
for l in range (0, len (metadata)):
metadata[l].pop('schema', None)
metadata[l].pop('element', None)
metadata[l].pop('qualifier', None)
languageValue = metadata[l]['language']
if metadata[l]['key'] == deletedKey:
provNote = '\''+deletedKey+'\' was deleted through a batch process on '+datetime.now().strftime('%Y-%m-%d %H:%M:%S')+'.'
provNoteElement = {}
provNoteElement['key'] = 'dc.description.provenance'
provNoteElement['value'] = unicode(provNote)
provNoteElement['language'] = 'en_US'
itemMetadataProcessed.append(provNoteElement)
else:
itemMetadataProcessed.append(metadata[l])
recordsEdited = recordsEdited + 1
itemMetadataProcessed = json.dumps(itemMetadataProcessed)
print 'updated', itemLink, recordsEdited
delete = requests.delete(baseURL+itemLink+'/metadata', headers=header, cookies=cookies, verify=verify)
print delete
post = requests.put(baseURL+itemLink+'/metadata', headers=header, cookies=cookies, verify=verify, data=itemMetadataProcessed)
print post
f.writerow([itemLink]+[deletedKey]+[delete]+[post])
itemLinks.append(itemLink)
offset = offset + 200
print offset
for itemLink in itemLinks:
itemMetadataProcessed = []
print itemLink
metadata = requests.get(baseURL + itemLink + '/metadata', headers=header, cookies=cookies, verify=verify).json()
for l in range (0, len (metadata)):
metadata[l].pop('schema', None)
metadata[l].pop('element', None)
metadata[l].pop('qualifier', None)
languageValue = metadata[l]['language']
if metadata[l]['key'] == deletedKey:
provNote = '\''+deletedKey+'\' was deleted through a batch process on '+datetime.now().strftime('%Y-%m-%d %H:%M:%S')+'.'
provNoteElement = {}
provNoteElement['key'] = 'dc.description.provenance'
provNoteElement['value'] = unicode(provNote)
provNoteElement['language'] = 'en_US'
itemMetadataProcessed.append(provNoteElement)
else:
itemMetadataProcessed.append(metadata[l])
recordsEdited = recordsEdited + 1
itemMetadataProcessed = json.dumps(itemMetadataProcessed)
print 'updated', itemLink, recordsEdited
delete = requests.delete(baseURL+itemLink+'/metadata', headers=header, cookies=cookies, verify=verify)
print delete
post = requests.put(baseURL+itemLink+'/metadata', headers=header, cookies=cookies, verify=verify, data=itemMetadataProcessed)
print post
f.writerow([itemLink]+[deletedKey]+[delete]+[post])


logout = requests.post(baseURL+'/rest/logout', headers=header, cookies=cookies, verify=verify)

Expand Down
55 changes: 30 additions & 25 deletions deleteKeyFromCommunity.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
print 'Editing Stage'
else:
print 'Editing Stage'

parser = argparse.ArgumentParser()
parser.add_argument('-k', '--deletedKey', help='the key to be deleted. optional - if not provided, the script will ask for input')
parser.add_argument('-i', '--handle', help='handle of the community to retreive. optional - if not provided, the script will ask for input')
Expand All @@ -39,6 +39,7 @@
password = secrets.password
filePath = secrets.filePath
verify = secrets.verify
skippedCollections = secrets.skippedCollections

startTime = time.time()
data = {'email':email,'password':password}
Expand Down Expand Up @@ -66,6 +67,7 @@
offset = 0
recordsEdited = 0
items = ''
itemLinks = []
while items != []:
endpoint = baseURL+'/rest/filtered-items?query_field[]='+deletedKey+'&query_op[]=exists&query_val[]='+collSels+'&limit=200&offset='+str(offset)
print endpoint
Expand All @@ -74,32 +76,35 @@
for item in items:
itemMetadataProcessed = []
itemLink = item['link']
print itemLink
metadata = requests.get(baseURL + itemLink + '/metadata', headers=header, cookies=cookies, verify=verify).json()
for l in range (0, len (metadata)):
metadata[l].pop('schema', None)
metadata[l].pop('element', None)
metadata[l].pop('qualifier', None)
languageValue = metadata[l]['language']
if metadata[l]['key'] == deletedKey:
provNote = '\''+deletedKey+'\' was deleted through a batch process on '+datetime.now().strftime('%Y-%m-%d %H:%M:%S')+'.'
provNoteElement = {}
provNoteElement['key'] = 'dc.description.provenance'
provNoteElement['value'] = unicode(provNote)
provNoteElement['language'] = 'en_US'
itemMetadataProcessed.append(provNoteElement)
else:
itemMetadataProcessed.append(metadata[l])
recordsEdited = recordsEdited + 1
itemMetadataProcessed = json.dumps(itemMetadataProcessed)
print 'updated', itemLink, recordsEdited
delete = requests.delete(baseURL+itemLink+'/metadata', headers=header, cookies=cookies, verify=verify)
print delete
post = requests.put(baseURL+itemLink+'/metadata', headers=header, cookies=cookies, verify=verify, data=itemMetadataProcessed)
print post
f.writerow([itemLink]+[deletedKey]+[delete]+[post])
itemLinks.append(itemLink)
offset = offset + 200
print offset
for itemLink in itemLinks:
itemMetadataProcessed = []
print itemLink
metadata = requests.get(baseURL + itemLink + '/metadata', headers=header, cookies=cookies, verify=verify).json()
for l in range (0, len (metadata)):
metadata[l].pop('schema', None)
metadata[l].pop('element', None)
metadata[l].pop('qualifier', None)
languageValue = metadata[l]['language']
if metadata[l]['key'] == deletedKey:
provNote = '\''+deletedKey+'\' was deleted through a batch process on '+datetime.now().strftime('%Y-%m-%d %H:%M:%S')+'.'
provNoteElement = {}
provNoteElement['key'] = 'dc.description.provenance'
provNoteElement['value'] = unicode(provNote)
provNoteElement['language'] = 'en_US'
itemMetadataProcessed.append(provNoteElement)
else:
itemMetadataProcessed.append(metadata[l])
recordsEdited = recordsEdited + 1
itemMetadataProcessed = json.dumps(itemMetadataProcessed)
print 'updated', itemLink, recordsEdited
delete = requests.delete(baseURL+itemLink+'/metadata', headers=header, cookies=cookies, verify=verify)
print delete
post = requests.put(baseURL+itemLink+'/metadata', headers=header, cookies=cookies, verify=verify, data=itemMetadataProcessed)
print post
f.writerow([itemLink]+[deletedKey]+[delete]+[post])

logout = requests.post(baseURL+'/rest/logout', headers=header, cookies=cookies, verify=verify)

Expand Down
82 changes: 49 additions & 33 deletions deleteKeyValuePairFromCollection.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,7 @@
import csv
from datetime import datetime
import urllib3

urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
import argparse

secretsVersion = raw_input('To edit production server, enter the name of the secrets file: ')
if secretsVersion != '':
Expand All @@ -18,15 +17,33 @@
else:
print 'Editing Stage'

parser = argparse.ArgumentParser()
parser.add_argument('-k', '--deletedKey', help='the key to be deleted. optional - if not provided, the script will ask for input')
parser.add_argument('-v', '--deletedValue', help='the value to be deleted. optional - if not provided, the script will ask for input')
parser.add_argument('-i', '--handle', help='handle of the community to retreive. optional - if not provided, the script will ask for input')
args = parser.parse_args()

if args.deletedKey:
deletedKey = args.deletedKey
else:
deletedKey = raw_input('Enter the key to be deleted: ')
if args.deletedValue:
deletedValue = args.deletedValue
else:
deletedValue = raw_input('Enter the value to be deleted: ')
if args.handle:
handle = args.handle
else:
handle = raw_input('Enter collection handle: ')

urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)

baseURL = secrets.baseURL
email = secrets.email
password = secrets.password
filePath = secrets.filePath
verify = secrets.verify

collectionHandle = raw_input('Enter collection handle: ')
deletedKey = raw_input('Enter key to be deleted: ')
deletedValue = raw_input('Enter value to be deleted: ')
skippedCollections = secrets.skippedCollections

startTime = time.time()
data = {'email':email,'password':password}
Expand All @@ -38,36 +55,36 @@
status = requests.get(baseURL+'/rest/status', headers=header, cookies=cookies, verify=verify).json()
print 'authenticated'

itemList = []
endpoint = baseURL+'/rest/handle/'+collectionHandle
endpoint = baseURL+'/rest/handle/'+handle
collection = requests.get(endpoint, headers=header, cookies=cookies, verify=verify).json()
collectionID = collection['uuid']
collSels = '&collSel[]=' + collectionID

f=csv.writer(open(filePath+'deletedKey'+datetime.now().strftime('%Y-%m-%d %H.%M.%S')+'.csv', 'wb'))
f.writerow(['itemID']+['deletedKey']+['deletedValue']+['delete']+['post'])
recordsEdited = 0
offset = 0
items = ''
itemLinks = []
while items != []:
items = requests.get(baseURL+'/rest/collections/'+str(collectionID)+'/items?limit=200&offset='+str(offset), headers=header, cookies=cookies, verify=verify)
while items.status_code != 200:
time.sleep(5)
items = requests.get(baseURL+'/rest/collections/'+str(collectionID)+'/items?limit=200&offset='+str(offset), headers=header, cookies=cookies, verify=verify)
items = items.json()
for k in range (0, len (items)):
itemID = items[k]['uuid']
itemList.append(itemID)
endpoint = baseURL+'/rest/filtered-items?query_field[]='+deletedKey+'&query_op[]=exists&query_val[]='+collSels+'&limit=200&offset='+str(offset)
print endpoint
response = requests.get(endpoint, headers=header, cookies=cookies, verify=verify).json()
items = response['items']
for item in items:
itemMetadataProcessed = []
itemLink = item['link']
itemLinks.append(itemLink)
offset = offset + 200
elapsedTime = time.time() - startTime
m, s = divmod(elapsedTime, 60)
h, m = divmod(m, 60)
print 'Item list creation time: ','%d:%02d:%02d' % (h, m, s)

recordsEdited = 0
f=csv.writer(open(filePath+'deletedKey'+datetime.now().strftime('%Y-%m-%d %H.%M.%S')+'.csv', 'wb'))
f.writerow(['itemID']+['deletedKey']+['deletedValue']+['delete']+['post'])
for number, itemID in enumerate(itemList):
itemsRemaining = len(itemList) - number
print 'Items remaining: ', itemsRemaining, 'ItemID: ', itemID
metadata = requests.get(baseURL+'/rest/items/'+str(itemID)+'/metadata', headers=header, cookies=cookies, verify=verify).json()
print offset
for itemLink in itemLinks:
itemMetadataProcessed = []
print itemLink
metadata = requests.get(baseURL + itemLink + '/metadata', headers=header, cookies=cookies, verify=verify).json()
for l in range (0, len (metadata)):
metadata[l].pop('schema', None)
metadata[l].pop('element', None)
metadata[l].pop('qualifier', None)
if metadata[l]['key'] == deletedKey and metadata[l]['value'] == deletedValue:
provNote = '\''+deletedKey+':'+deletedValue+'\' was deleted through a batch process on '+datetime.now().strftime('%Y-%m-%d %H:%M:%S')+'.'
provNoteElement = {}
Expand All @@ -77,16 +94,15 @@
itemMetadataProcessed.append(provNoteElement)
else:
itemMetadataProcessed.append(metadata[l])

if itemMetadataProcessed != metadata:
recordsEdited = recordsEdited + 1
itemMetadataProcessed = json.dumps(itemMetadataProcessed)
print 'updated', itemID, recordsEdited
delete = requests.delete(baseURL+'/rest/items/'+str(itemID)+'/metadata', headers=header, cookies=cookies, verify=verify)
print 'updated', itemLink, recordsEdited
delete = requests.delete(baseURL+itemLink+'/metadata', headers=header, cookies=cookies, verify=verify)
print delete
post = requests.put(baseURL+'/rest/items/'+str(itemID)+'/metadata', headers=header, cookies=cookies, verify=verify, data=itemMetadataProcessed)
post = requests.put(baseURL+itemLink+'/metadata', headers=header, cookies=cookies, verify=verify, data=itemMetadataProcessed)
print post
f.writerow([itemID]+[deletedKey]+[deletedValue]+[delete]+[post])
f.writerow([itemLink]+[deletedKey]+[deletedValue]+[delete]+[post])

logout = requests.post(baseURL+'/rest/logout', headers=header, cookies=cookies, verify=verify)

Expand Down
1 change: 1 addition & 0 deletions editBitstreamsNames.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@
password = secrets.password
filePath = secrets.filePath
verify = secrets.verify
skippedCollections = secrets.skippedCollections

startTime = time.time()
data = {'email':email,'password':password}
Expand Down
1 change: 1 addition & 0 deletions generateCollectionLevelAbstract.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@
password = secrets.password
filePath = secrets.filePath
verify = secrets.verify
skippedCollections = secrets.skippedCollections

data = {'email':email,'password':password}
header = {'content-type':'application/json','accept':'application/json'}
Expand Down
Loading

0 comments on commit ff5a48a

Please sign in to comment.