INDEX
Explanations
phrases expressing strong emotions or opinions
New Auto-Interp
Negative Logits
pload
-0.90
ulia
-0.79
ifully
-0.70
yip
-0.68
othy
-0.68
nesota
-0.68
Princ
-0.68
ammy
-0.68
amins
-0.67
iggs
-0.67
POSITIVE LOGITS
ingrained
1.16
rooted
1.09
flawed
0.98
saddened
0.95
entrenched
0.92
indebted
0.90
intertwined
0.90
implicated
0.90
regret
0.88
wounded
0.87
Activations Density 0.048%