INDEX
Explanations
the presence of noun forms and other specific upward or downward indications across a range of text
New Auto-Interp
Negative Logits
theless
-0.28
plier
-0.27
ember
-0.24
aurant
-0.23
Ø©
-0.22
thing
-0.21
folio
-0.21
uary
-0.20
thesis
-0.20
acity
-0.20
POSITIVE LOGITS
Wolff
0.16
ungs
0.16
ÅŁt
0.15
uards
0.15
ards
0.14
bbb
0.14
bras
0.14
places
0.14
íļĮìĿĺ
0.14
ej
0.14
Activations Density 0.290%