INDEX
Explanations
references to interactions and relationships among individuals and groups
New Auto-Interp
Negative Logits
himself
-0.28
his
-0.19
Himself
-0.19
myself
-0.19
itself
-0.17
Ø®ÙĪØ¯Ø´
-0.17
his
-0.17
sám
-0.16
ulk
-0.16
los
-0.15
POSITIVE LOGITS
themselves
0.61
Their
0.37
their
0.35
Their
0.35
leurs
0.34
their
0.34
thems
0.31
yourselves
0.28
иÑħ
0.28
jejich
0.27
Activations Density 0.753%