INDEX
Explanations
words related to failure
instances of the word "failed" and its variations
New Auto-Interp
Negative Logits
dar
-0.71
onen
-0.68
iser
-0.67
ript
-0.66
enfranch
-0.66
iliary
-0.64
utra
-0.63
Forward
-0.63
inda
-0.63
arya
-0.62
POSITIVE LOGITS
miser
1.46
lect
0.90
catast
0.90
horribly
0.90
dism
0.89
ingly
0.86
fully
0.84
muster
0.76
spectacular
0.76
afe
0.74
Activations Density 0.028%