INDEX
Explanations
references to people or groups in a community context
New Auto-Interp
Negative Logits
themselves
-0.20
and
-0.17
otherwise
-0.17
itself
-0.16
edly
-0.16
swers
-0.15
ibur
-0.15
rail
-0.14
åı¦ä¸Ģ
-0.14
isode
-0.14
POSITIVE LOGITS
-than
0.24
wis
0.20
/new
0.20
besides
0.20
bes
0.20
world
0.20
most
0.19
/all
0.17
ness
0.17
türlü
0.17
Activations Density 0.046%