INDEX
Explanations
references to mistakes or inaccuracies
New Auto-Interp
Negative Logits
edom
-0.79
tsky
-0.78
ILA
-0.76
gdala
-0.72
apy
-0.72
nai
-0.72
rises
-0.71
amen
-0.67
electric
-0.67
estine
-0.66
POSITIVE LOGITS
uracy
0.88
margin
0.87
ously
0.85
corrected
0.79
fully
0.79
dece
0.77
corrections
0.77
error
0.75
perpetrated
0.75
correction
0.74
Activations Density 0.013%