INDEX
Explanations
terms related to political and social contexts
New Auto-Interp
Negative Logits
est
-0.17
ORB
-0.16
ive
-0.16
erez
-0.16
ables
-0.15
IRD
-0.15
/fast
-0.15
able
-0.15
ird
-0.14
ãĤ¨ãĥ«
-0.14
POSITIVE LOGITS
speaking
0.31
sound
0.27
sound
0.27
Speaking
0.26
-speaking
0.25
SOUND
0.24
Sound
0.23
Speaking
0.23
minded
0.23
Sound
0.23
Activations Density 0.059%