INDEX
Explanations
phrases related to negative outcomes or shortcomings
instances of the word "failure."
New Auto-Interp
Negative Logits
selves
-0.80
enfranch
-0.70
rete
-0.70
utra
-0.68
esthetic
-0.65
estamp
-0.64
Ec
-0.64
arbon
-0.62
orgetown
-0.62
ocard
-0.61
POSITIVE LOGITS
miser
1.08
failures
0.87
DEV
0.82
Failure
0.81
failure
0.81
ulence
0.74
rate
0.73
istence
0.72
luster
0.72
Failure
0.71
Activations Density 0.024%