INDEX
Explanations
instances of the word "wrong" and its variations
New Auto-Interp
Negative Logits
TIS
-0.56
+*
-0.53
calfe
-0.53
hibli
-0.52
Periods
-0.50
ofan
-0.47
CCM
-0.47
Deli
-0.47
Tahiti
-0.47
uride
-0.47
POSITIVE LOGITS
wrong
1.23
wrong
1.20
Wrong
1.18
Wrong
1.13
WRONG
1.05
WRONG
1.05
wrongs
0.85
sbag
0.78
wrongful
0.71
incorrect
0.70
Activations Density 0.009%