INDEX
Explanations
expressions related to social interactions and relationships
New Auto-Interp
Negative Logits
...
-0.35
"
-0.28
â̦
-0.27
...
-0.24
“
-0.23
-↵
-0.23
-↵
-0.22
-
-0.22
[
-0.21
↵
-0.21
POSITIVE LOGITS
.Companion
0.15
(«
0.14
ãĢįãĢĮ
0.14
_marshall
0.14
uitka
0.14
`%
0.14
(crate
0.13
putas
0.13
伸
0.13
".$_
0.13
Activations Density 0.028%