INDEX
Explanations
words related to emotions or feelings
New Auto-Interp
Negative Logits
óz
-0.17
彦
-0.17
оÑĩкÑĥ
-0.16
viz
-0.16
ÑıÑĤÑģÑı
-0.16
ÑıÑĤ
-0.16
Ñıл
-0.15
achel
-0.15
osed
-0.15
ixel
-0.15
POSITIVE LOGITS
еÑģÑĤв
0.27
еÑģÑĤво
0.25
emy
0.21
еÑģÑĤва
0.18
emi
0.18
ãĥ§
0.17
ero
0.16
midt
0.16
ews
0.16
ем
0.15
Activations Density 0.040%