INDEX
Explanations
occurrences of the word "loss" with a focus on negative consequences
repeated mentions of the term "loss."
New Auto-Interp
Negative Logits
ENTS
-0.70
rouse
-0.66
enegger
-0.66
ATT
-0.65
ECK
-0.64
populated
-0.63
76561
-0.63
Caucas
-0.62
ansky
-0.62
chell
-0.59
POSITIVE LOGITS
loss
1.05
aversion
1.05
Loss
0.98
loss
0.93
losses
0.88
iem
0.87
luster
0.75
prevention
0.73
lust
0.73
front
0.71
Activations Density 0.013%