INDEX
Explanations
topics related to societal responsibility and regulation
New Auto-Interp
Negative Logits
anzi
-0.17
ียà¸ģ
-0.16
вÑĸÑĤ
-0.16
Lopez
-0.16
μμε
-0.15
øre
-0.15
nackte
-0.15
اÛĮر
-0.15
López
-0.15
okino
-0.15
POSITIVE LOGITS
allowed
0.22
welcome
0.21
Allowed
0.18
allowed
0.18
odox
0.18
alth
0.18
Welcome
0.18
Allowed
0.18
Welcome
0.17
permitted
0.17
Activations Density 0.012%