INDEX
Explanations
words or names related to the concept of "new" or "novelty."
New Auto-Interp
Negative Logits
optera
-0.19
thic
-0.17
stab
-0.17
oke
-0.16
pta
-0.16
pth
-0.16
erin
-0.15
achts
-0.15
uario
-0.15
inel
-0.15
POSITIVE LOGITS
ismatic
0.20
nu
0.17
Nu
0.16
gens
0.15
dea
0.15
íħĶ
0.15
subt
0.14
vem
0.14
NU
0.14
Nu
0.14
Activations Density 0.034%