INDEX
Explanations
phrases indicating quantity or frequency
New Auto-Interp
Negative Logits
zig
-0.17
aters
-0.15
fragistics
-0.15
urb
-0.15
forth
-0.14
oretical
-0.14
inki
-0.14
repid
-0.14
.twimg
-0.14
ux
-0.14
POSITIVE LOGITS
-the
0.31
s
0.28
town
0.26
abouts
0.25
/about
0.25
town
0.25
about
0.23
-town
0.22
thew
0.22
/on
0.21
Activations Density 0.055%