INDEX
Explanations
phrases that indicate measurements or rates
New Auto-Interp
Negative Logits
ÏĥÏĩ
-0.17
unger
-0.16
yard
-0.15
routine
-0.14
us
-0.14
erties
-0.14
eg
-0.14
们
-0.14
info
-0.13
jen
-0.13
POSITIVE LOGITS
annum
0.19
isposable
0.15
fter
0.15
iphery
0.15
ipherals
0.15
ipher
0.15
pend
0.15
legate
0.14
.Tx
0.14
keer
0.14
Activations Density 0.024%