INDEX
Explanations
mentions of individuals and their actions or characteristics
references to individuals and their roles or experiences
New Auto-Interp
Negative Logits
CONTIN
-0.64
Compl
-0.58
âĵĺ
-0.58
rematch
-0.58
unless
-0.57
':
-0.55
maxwell
-0.55
STER
-0.52
CAR
-0.52
ravings
-0.52
POSITIVE LOGITS
unsuccessfully
0.90
earlier
0.86
last
0.74
previous
0.70
beforehand
0.68
terday
0.67
womb
0.64
ago
0.64
unsuccessful
0.63
originally
0.58
Activations Density 1.783%