INDEX
Explanations
words related to failures or negative outcomes
terms related to failures or negative outcomes
New Auto-Interp
Negative Logits
ingham
-0.75
othermal
-0.74
venants
-0.74
trak
-0.71
atures
-0.71
ignty
-0.68
otine
-0.68
weights
-0.66
types
-0.66
akens
-0.66
POSITIVE LOGITS
erella
0.82
mishand
0.78
itous
0.78
disastrous
0.77
ilton
0.74
bung
0.73
botched
0.72
miser
0.72
Spac
0.71
Ukrain
0.69
Activations Density 0.026%