INDEX
Explanations
instances of groups and organizations, especially in the context of societal issues
New Auto-Interp
Negative Logits
avra
-0.16
idor
-0.15
aign
-0.14
izza
-0.14
something
-0.14
pic
-0.14
ft
-0.14
é¬
-0.14
yor
-0.14
agues
-0.13
POSITIVE LOGITS
whose
0.24
which
0.21
that
0.21
mÃł
0.20
that
0.20
which
0.20
коÑĤоÑĢÑĭе
0.20
whose
0.19
коÑĤоÑĢаÑı
0.18
коÑĤоÑĢÑĭй
0.18
Activations Density 0.142%