INDEX
Explanations
references to comparison with others or alternative entities
New Auto-Interp
Negative Logits
lain
-0.19
other
-0.18
cken
-0.17
Other
-0.17
ãģĿãģ®ä»ĸ
-0.17
otherwise
-0.16
autres
-0.15
Other
-0.15
amen
-0.15
swers
-0.15
POSITIVE LOGITS
-than
0.34
world
0.29
equally
0.29
similarly
0.28
than
0.26
/new
0.25
niż
0.25
ewise
0.24
wis
0.24
similar
0.23
Activations Density 0.112%