INDEX
Explanations
words related to negative emotions or situations
New Auto-Interp
Negative Logits
ouver
-0.71
gat
-0.67
irrig
-0.66
vetted
-0.65
iltration
-0.63
vernment
-0.63
aeda
-0.62
authorized
-0.62
entials
-0.62
fielded
-0.61
POSITIVE LOGITS
omas
1.39
der
1.27
istically
1.25
istic
1.25
istical
0.90
stal
0.88
die
0.86
onic
0.81
hus
0.80
fully
0.79
Activations Density 0.105%