INDEX
Explanations
phrases related to personal identity and social connections
New Auto-Interp
Negative Logits
are
-0.19
æĺ¯åľ¨
-0.16
lerdir
-0.16
is
-0.16
veis
-0.16
ãģ¯
-0.16
æĺ¯æĪij
-0.15
جÙĩ
-0.15
adalah
-0.14
ijken
-0.14
POSITIVE LOGITS
AtPath
0.17
been
0.16
been
0.16
Been
0.15
été
0.15
367
0.14
mite
0.14
loon
0.14
ãĥĵãĥ¼
0.14
186
0.13
Activations Density 0.012%