INDEX
Explanations
elements indicative of quality literature or insightful writing
New Auto-Interp
Negative Logits
cess
-0.15
ichni
-0.15
ulumi
-0.15
ourd
-0.15
ORTH
-0.14
erb
-0.14
eff
-0.14
sounding
-0.14
onis
-0.14
irler
-0.13
POSITIVE LOGITS
ones
0.24
unique
0.23
theirs
0.21
particular
0.20
Unique
0.20
UNIQUE
0.19
unlike
0.19
special
0.18
unique
0.18
uniquely
0.18
Activations Density 0.125%