DReAMy is a toolkit to automatically analyse and annotate textual dream reports. the current annotation system is based on three of the Hall & Van de Castle features, namely Characters, Emotions, and Activities, but we are looking forward to expanding DReAMy's capabilities. For further technical and theoretical details, please refer to Bertolini et al., 24 CLPsych-WS and Bertolini et al., 24 Sleep Medicine. You can also follow us on Twitter, to keep up with updates and relevant work!
DReAMy can be easily installed via pip, and we do recommend using a virtual environment with python 3.9 or 3.10 installed. If you wish to play/query DReAMy's model, you can use the 🤗 Space DReAM
.
pip install dreamy
At the moment, DReAMy has three main features:
- Data: download datasets, containing dream reports from DreamBank.
- Analyse: collect (contextualised) embeddings, dimensionality reduction, clustering. (Colab tutorial)
- Anonymise: mask potentially sensitive entities (e.g., names, locations, organisations) with numbered concepts (e.g., Person1, Person2; Location1; Organisation1). (Colab tutorial)
- Annotate: label dream reports following different HVDC features, using one of three tasks – entity recognition, sentiment analysis, and relation extraction. (Colab tutorial)
Usage examples of all features can be found in the code below, and in the tutorials in the dedicated folder.
DReAMy has direct access to two datasets. A smaller English-only (~ 22k), with more descriptive variables (such as gender and year of collection), and a larger one (~ 30k), with reports both in English and German. You can download them using the simple code below.
import dreamy
# choose between base (~ 22k reports, EN-only & more descriptive variables)
# large (~ 29k reports, reports in EN and De, only series as descriptive variables)
database = "base"
dream_bank = dreamy.get_HF_DreamBank(database=database, as_dataframe=True)
dream_bank.sample(2)
index | series | description | dreams | gender | year |
---|---|---|---|---|---|
5875 | blind-f | Blind dreamers (F) | I'm at work in the office of a rehab teacher named D, a transistor radio is on, I held it in my hand and placed it on my desk. [...]. | female | mid-1990s |
12888 | emma | Emma: 48 years of dreams | I go to Pedro's house, he is fixing his bike. [...] | female | 1949-1997 |
You can also use DReAMy to easily extract, reduce, cluster, and visualise (contextualised) encodings (i.e., vector embeddings) of dream reports, with few and simple lines of code. At the moment, you can choose between four models, which are combinations of small/large English-only/Multilingual models.
import dreamy
# Get some data in a list form
list_of_reports = dream_bank["dreams"].tolist()
# set up model and get encodings
model_size = "small" # or large
model_lang = "english" # or multi, for multilingual
device = "cpu" # se to "cuda" for GPUs
report_encodings = dreamy.get_encodings(
list_of_reports,
model_size=model_size,
language=model_lang,
device=device,
)
# reduce space
# you can choose between pca/t-sne
X, Y = dreamy.reduce_space(report_encodings, method="pca")
# Update your original dataframe with coordinates and plot
dream_bank["DR_X"], dream_bank["DR_Y"] = X, Y
You can then use your favourite library to visualise the results.
Dream reports often contain personal information, such as references to family members, places, organisations. To hide this information you can use the anonimise
function. This will use a multilingual name entity recognition model (that you can change at will) to find and replace tokens like "Amy", "Milano", or "Nike", with "PersonN", "LocationN", or "OrganisationN" --- where "N" is a number between 1 and the highest count for each type of entity, independently.
anon_dreams_list = dreamy.anonimise(list_of_reports, return_original=False, batch_size=16)
anon_dreams_list[0]
I was in the Organisation1 playground, and this time Person1 had taken off to somewhere in the school. So I ran over to Person2 and asked her if she knew where Person1 was, and she said: "I can't tell you that." [...]"
As mentioned, the Annotate features revolve around three tasks:
-
NER : (name entity recognition) which annotates reports with respect to the character appearing in a report.
-
SA: (sentiment analysis) which annotates reports with respect to which of the five Hall & Van de Castle emotions (anger, apprehension, confusion, sadn4ssm happiness) appear in a report (possibly, also which character is experiencing them)
-
RE: (relation extraction), which extracts entities (characters) in a report and the relation between them. At the moment, the only RE task available refers to the activity feature of the HVDC framework.
All the tasks are handled via the main annotate_reports
method and can be called by simply changing the task
argument. Check the dedicated tutorial for more.
task = "SA"
batch_size = 16
device = "cpu" # or "cuda" / device number (e.g., 0) for GPU
SA_predictions = dreamy.annotate_reports(
list_of_reports,
task=task,
device=device,
batch_size=batch_size,
)
SA_predictions
[[{'label': 'SD', 'score': 0.9931567311286926},
{'label': 'HA', 'score': 0.08149773627519608},
{'label': 'CO', 'score': 0.04012126475572586},
{'label': 'AN', 'score': 0.007265480700880289},
{'label': 'AP', 'score': 0.006692806258797646}],
...
Please refer to the tutorial(s) for more information and details.
If you wish to contribute, collaborate, or just ask a question, feel free to contact Lorenzo, or use the issue section.
If you use DReAMy in your project or research, please cite the work as
@article{BERTOLINI2024406,
title = {DReAMy: a library for the automatic analysis and annotation of dream reports with multilingual large language models},
journal = {Sleep Medicine},
volume = {115},
pages = {406-407},
year = {2024},
note = {Abstracts from the 17th World Sleep Congress},
issn = {1389-9457},
doi = {https://doi.org/10.1016/j.sleep.2023.11.1092},
url = {https://www.sciencedirect.com/science/article/pii/S1389945723015186},
author = {L. Bertolini and A. Michalak and J. Weeds}
}
DReAMy was supported by the Horizon 2020 project Humane AI (grant N° 952026)