INDEX
Explanations
phrases emphasizing different types of emphasis and excitement in the text
New Auto-Interp
Negative Logits
ügen
-0.18
isay
-0.16
nip
-0.16
oins
-0.16
igate
-0.15
lify
-0.15
accumulate
-0.15
imore
-0.15
istar
-0.15
Enlarge
-0.15
POSITIVE LOGITS
don
0.23
wik
0.18
rank
0.17
Perr
0.17
Don
0.17
don
0.16
try
0.16
limit
0.15
Don
0.15
arm
0.15
Activations Density 0.355%