INDEX
Explanations
phrases that express interpersonal connections and interactions
New Auto-Interp
Negative Logits
ety
-0.19
.cg
-0.16
rey
-0.15
ób
-0.15
zh
-0.15
mani
-0.14
ythe
-0.14
gren
-0.14
æ¤
-0.14
enor
-0.14
POSITIVE LOGITS
found
0.23
find
0.22
found
0.22
finds
0.22
(find
0.21
Find
0.21
æī¾åΰ
0.20
find
0.19
Find
0.18
-find
0.18
Activations Density 0.049%