INDEX
Explanations
numerical data and references to academic papers or studies
Citations with volume and page numbers
citation formatting
New Auto-Interp
Negative Logits
themselves
-0.82
Their
-0.77
their
-0.77
Their
-0.77
yourselves
-0.76
collectively
-0.71
themselves
-0.71
their
-0.71
eds
-0.71
deres
-0.71
POSITIVE LOGITS
himself
0.87
himself
0.70
一人で
0.64
his
0.61
himſelf
0.60
själv
0.60
herself
0.60
alone
0.57
OMITBAD
0.55
IEWS
0.55
Activations Density 0.127%