INDEX
Explanations
references to personal or collective relationships and community connections
New Auto-Interp
Negative Logits
667
-0.16
himself
-0.15
oneself
-0.15
PLY
-0.14
ig
-0.14
алеж
-0.14
ruise
-0.14
Himself
-0.14
itself
-0.14
owed
-0.13
POSITIVE LOGITS
others
0.28
Others
0.23
Others
0.22
others
0.22
anyone
0.20
anybody
0.17
Anyone
0.16
society
0.16
everyone
0.15
millions
0.15
Activations Density 0.100%