INDEX
Explanations
concepts related to priorities and values in society
New Auto-Interp
Negative Logits
GOODMAN
-0.18
isko
-0.14
onas
-0.13
iÅŁ
-0.13
ria
-0.13
Tư
-0.13
_DELETED
-0.12
_LS
-0.12
/tos
-0.12
adolu
-0.12
POSITIVE LOGITS
:
0.53
ा:
0.31
à¹Į:
0.28
namely
0.27
ï¼ļ
0.27
ÛĮعÙĨÛĮ
0.26
nam
0.26
viz
0.24
:
0.24
*:
0.23
Activations Density 0.467%