INDEX
Explanations
instances where the word "mistake" is present, regardless of context
terms related to errors or mistakes
New Auto-Interp
Negative Logits
Hughes
-0.81
Kaufman
-0.74
Levine
-0.73
homosexuality
-0.72
veget
-0.71
Arnold
-0.69
animate
-0.68
Subway
-0.66
injected
-0.66
eggs
-0.66
POSITIVE LOGITS
mist
3.52
Mist
2.52
Mist
2.25
search
1.23
Mistress
1.15
misc
1.11
Ranger
1.09
slightest
1.04
icion
1.01
ISS
1.01
Activations Density 0.055%