INDEX
Explanations
words that indicate failure or shortcomings
New Auto-Interp
Negative Logits
ization
-0.86
Vidite
-0.84
AndEndTag
-0.84
Demografie
-0.83
θρώ
-0.79
OGND
-0.78
uVar
-0.77
NUMX
-0.75
Elbe
-0.75
Rüyada
-0.75
POSITIVE LOGITS
fail
2.15
fails
2.04
failed
2.02
Fail
1.94
Failed
1.88
fail
1.86
failed
1.80
fails
1.79
Fails
1.77
Fail
1.77
Activations Density 0.073%