INDEX
Explanations
documents with mentions of specific individuals and medical conditions
instances of reported speech or statements made by individuals
New Auto-Interp
Negative Logits
hub
-0.71
selves
-0.59
eps
-0.57
Their
-0.55
atters
-0.52
THEIR
-0.51
unison
-0.51
odes
-0.51
Hole
-0.51
Recommended
-0.50
POSITIVE LOGITS
himself
0.95
remorse
0.74
confess
0.69
confessed
0.68
sul
0.66
inco
0.65
briefly
0.63
calmly
0.63
voluntarily
0.62
¶æ
0.62
Activations Density 1.481%