INDEX
Explanations
references to personal relationships and social dynamics involving individuals and groups
New Auto-Interp
Negative Logits
�
-0.20
Â
-0.19
Âĸ
-0.16
ÂĶ
-0.16
ãĤĪãģĨãģ§ãģĻ
-0.15
Âĵ
-0.15
ÂĿ
-0.14
ÃĤ
-0.14
´t
-0.14
´
-0.14
POSITIVE LOGITS
's
0.82
’s
0.75
çļĦ
0.63
ìĿĺ
0.54
ãģ®
0.45
çļĦ大
0.44
‘s
0.43
´s
0.43
çļĦ
0.43
'S
0.42
Activations Density 2.081%