INDEX
Explanations
proper nouns
specific proper nouns and numerical data
New Auto-Interp
Negative Logits
misunder
-0.68
iatus
-0.65
Rasm
-0.63
etsk
-0.62
Join
-0.61
ccording
-0.60
ancest
-0.58
umbn
-0.58
tymology
-0.58
ppo
-0.56
POSITIVE LOGITS
vice
0.73
aurus
0.68
aiman
0.67
prefers
0.65
ifle
0.63
thereafter
0.61
expects
0.59
considers
0.59
anges
0.58
clair
0.58
Activations Density 0.383%