INDEX
Explanations
words and phrases related to familial or interpersonal relationships
New Auto-Interp
Negative Logits
öt
-0.15
—↵
-0.15
Äįin
-0.14
JKLMNOP
-0.14
#
-0.14
ActionTypes
-0.14
-strokes
-0.14
alleng
-0.14
==>
-0.14
readcr
-0.13
POSITIVE LOGITS
-âĢIJ
0.28
âĢIJ
0.26
-
0.26
–
0.25
âĪĴ
0.24
âĢij
0.23
{-0.22
-
0.21
Âĸ
0.21
--
0.20
Activations Density 0.634%