INDEX
Explanations
phrases related to failure
instances of failures or unsuccessful outcomes
New Auto-Interp
Negative Logits
enfranch
-0.77
iliary
-0.73
til
-0.73
selves
-0.64
Revolution
-0.64
collar
-0.63
Austral
-0.62
ourt
-0.61
arya
-0.60
dar
-0.60
POSITIVE LOGITS
miser
1.15
fail
0.97
fail
0.94
ingly
0.85
failures
0.82
failure
0.81
DEV
0.78
horribly
0.77
lect
0.77
catast
0.77
Activations Density 0.014%