INDEX
Explanations
references to social dynamics and group relationships
New Auto-Interp
Negative Logits
alone
-0.23
ãģłãģij
-0.21
themselves
-0.20
often
-0.19
sometimes
-0.18
altogether
-0.18
Alone
-0.17
à¹Ģà¸Ńà¸ĩ
-0.17
each
-0.16
only
-0.16
POSITIVE LOGITS
except
0.36
except
0.36
кÑĢоме
0.30
imaginable
0.30
Except
0.30
Except
0.30
_except
0.28
including
0.27
vÄįetnÄĽ
0.26
INCLUDING
0.25
Activations Density 0.451%