INDEX
Explanations
instances of the word "dead."
New Auto-Interp
Negative Logits
Winfrey
-0.62
ESM
-0.62
Avril
-0.61
LTS
-0.59
Mania
-0.57
CSP
-0.57
للمعارف
-0.56
TSM
-0.56
kasarigan
-0.56
ilever
-0.56
POSITIVE LOGITS
dead
1.98
dead
1.84
Dead
1.66
Dead
1.64
DEAD
1.63
DEAD
1.52
muertos
0.96
muerta
0.95
isDead
0.94
мерт
0.94
Activations Density 0.007%