INDEX
Explanations
questions about identity and knowledge
New Auto-Interp
Negative Logits
ropa
-0.17
istine
-0.15
talk
-0.15
rech
-0.14
Lem
-0.14
ym
-0.14
ummies
-0.14
talking
-0.13
orian
-0.13
stim
-0.13
POSITIVE LOGITS
agues
0.16
uraa
0.16
ابÛĮ
0.15
sgi
0.14
пÑĢиÑģ
0.13
cheid
0.13
amedi
0.13
.abstract
0.13
itemid
0.13
putation
0.13
Activations Density 0.155%