INDEX
Explanations
terms related to political and social issues
New Auto-Interp
Negative Logits
ORB
-0.15
ált
-0.15
ables
-0.15
erez
-0.15
isma
-0.14
ird
-0.14
iates
-0.14
ive
-0.14
edula
-0.14
(er
-0.14
POSITIVE LOGITS
speaking
0.38
-speaking
0.33
sound
0.30
Speaking
0.28
minded
0.26
sound
0.25
spe
0.25
Speaking
0.25
challenged
0.21
SOUND
0.21
Activations Density 0.052%