INDEX
Explanations
words related to negative emotions or situations
expressions of sadness
New Auto-Interp
Negative Logits
ouver
-0.80
pegged
-0.70
authorized
-0.69
vernment
-0.67
rity
-0.64
sterling
-0.64
aeda
-0.64
irrig
-0.63
nomine
-0.63
lav
-0.62
POSITIVE LOGITS
omas
1.38
istic
1.33
istically
1.33
der
1.19
istical
1.01
omic
0.90
onna
0.86
hus
0.84
stal
0.83
die
0.82
Activations Density 0.072%