INDEX
Explanations
references to universal concepts and human rights
New Auto-Interp
Negative Logits
ãĤ¥
-0.17
yonel
-0.16
oret
-0.15
аннÑı
-0.15
iw
-0.15
esar
-0.15
ONO
-0.14
šli
-0.14
eden
-0.14
кеÑĤ
-0.14
POSITIVE LOGITS
Universal
0.21
/global
0.19
universal
0.19
ist
0.18
Universal
0.18
adel
0.18
iversal
0.18
istic
0.17
UNIVERS
0.17
Studios
0.17
Activations Density 0.011%