INDEX
Explanations
phrases indicating a high or significant quantity
New Auto-Interp
Negative Logits
eras
-0.17
uality
-0.16
eros
-0.16
eron
-0.15
places
-0.15
oa
-0.14
imir
-0.14
jin
-0.14
ling
-0.14
cken
-0.14
POSITIVE LOGITS
itness
0.17
warts
0.16
/out
0.14
jÅ¡ÃŃ
0.14
vrier
0.14
293
0.14
-the
0.14
tal
0.14
coat
0.14
esac
0.14
Activations Density 0.023%