INDEX
Explanations
names or pronouns referring to specific individuals
references to specific individuals and their actions or statements
New Auto-Interp
Negative Logits
ĸļ
-0.82
animate
-0.64
murderer
-0.64
Load
-0.62
UTE
-0.62
lifes
-0.61
«ĺ
-0.60
transformative
-0.60
Nirvana
-0.60
¾
-0.59
POSITIVE LOGITS
intends
1.65
expects
1.63
hopes
1.55
wants
1.49
anticip
1.46
insists
1.46
believes
1.45
prefers
1.43
vows
1.37
proposes
1.30
Activations Density 0.426%