INDEX
Explanations
key concepts related to policies, guidelines, and their implications on societal issues
New Auto-Interp
Negative Logits
eld
-0.16
kie
-0.16
enk
-0.15
otti
-0.15
eda
-0.14
ÙĬÙĦا
-0.14
agal
-0.14
elda
-0.14
Minority
-0.14
aille
-0.14
POSITIVE LOGITS
ertia
0.18
umlu
0.15
å·
0.15
LINE
0.14
ì¶©
0.14
alone
0.14
_sock
0.14
nackt
0.14
ulle
0.14
ÅĻiv
0.14
Activations Density 0.178%