INDEX
Explanations
words related to incorrectness or errors
instances of the word "wrong" and its variations, indicating errors or failures
New Auto-Interp
Negative Logits
enance
-0.73
ILA
-0.72
Ri
-0.67
Swim
-0.66
Flavoring
-0.64
tsky
-0.62
Chill
-0.60
CARE
-0.60
Colbert
-0.59
kamp
-0.59
POSITIVE LOGITS
headed
1.40
fully
1.39
doing
1.04
do
1.03
fulness
0.98
ful
0.91
footed
0.90
sight
0.89
behavior
0.89
dest
0.89
Activations Density 0.029%