INDEX
Explanations
mentions of social issues and the representation of marginalized voices
New Auto-Interp
Negative Logits
ÑĪÑĤов
-0.15
Glover
-0.15
'gc
-0.15
ÑĸйÑģ
-0.14
egt
-0.14
grily
-0.14
ornings
-0.13
vent
-0.13
λιά
-0.13
oader
-0.13
POSITIVE LOGITS
receives
0.35
receive
0.35
receiving
0.31
Receive
0.28
receive
0.27
Receive
0.25
received
0.24
RECEIVE
0.23
Rece
0.23
recibir
0.23
Activations Density 0.198%