INDEX
Explanations
instances where something has failed or been unsuccessful
instances of the word "failed."
New Auto-Interp
Negative Logits
enfranch
-0.81
selves
-0.74
til
-0.71
istics
-0.71
utra
-0.68
edged
-0.66
tip
-0.66
inda
-0.65
Layer
-0.64
ized
-0.64
POSITIVE LOGITS
miser
1.28
fail
0.92
failures
0.91
DEV
0.87
fail
0.85
Failed
0.84
failure
0.81
catast
0.81
dism
0.78
horribly
0.76
Activations Density 0.017%