INDEX
Explanations
personal pronouns (I, we, you) and verbs related to action or observation
instances of personal pronouns and references to individual experiences
New Auto-Interp
Negative Logits
olute
-0.71
Travels
-0.67
geoning
-0.65
Kard
-0.64
ardless
-0.63
Fill
-0.60
MFT
-0.58
Outbreak
-0.58
Prometheus
-0.58
ipient
-0.56
POSITIVE LOGITS
disliked
1.14
dislike
1.06
learned
0.98
learnt
0.96
noticed
0.96
liked
0.91
hated
0.91
Learned
0.88
wished
0.88
overlooked
0.86
Activations Density 0.147%