INDEX
Explanations
mentions of groups and affiliations
New Auto-Interp
Negative Logits
ppl
-0.16
anyone
-0.16
anybody
-0.15
interaction
-0.14
.Dot
-0.14
коÑĤоÑĢое
-0.14
engers
-0.13
oks
-0.13
ção
-0.13
someone
-0.13
POSITIVE LOGITS
backgrounds
0.29
whom
0.28
opposite
0.23
Generation
0.21
opposing
0.21
Generation
0.20
professions
0.20
households
0.19
different
0.19
surrounding
0.18
Activations Density 0.116%