INDEX
Explanations
references to people beyond oneself or one's immediate circle
New Auto-Interp
Negative Logits
otherwise
-0.18
edly
-0.17
itself
-0.17
rail
-0.15
åı¦ä¸Ģ
-0.15
swers
-0.14
ibur
-0.14
Other
-0.14
entai
-0.14
autre
-0.14
POSITIVE LOGITS
-than
0.22
most
0.20
world
0.19
wis
0.19
/new
0.19
/all
0.18
ness
0.18
bes
0.18
besides
0.18
elves
0.17
Activations Density 0.042%