INDEX
Explanations
words related to failures or negative outcomes
references to failures in various contexts
New Auto-Interp
Negative Logits
rete
-0.69
enfranch
-0.68
bern
-0.68
selves
-0.67
population
-0.66
atu
-0.66
riel
-0.65
utra
-0.65
rosse
-0.65
irin
-0.64
POSITIVE LOGITS
miser
1.30
dism
0.82
DEV
0.81
failures
0.79
catast
0.78
Failure
0.78
horribly
0.77
afe
0.73
lust
0.72
fail
0.71
Activations Density 0.029%