INDEX
Explanations
instances of the word "regular" or its variants, indicating a focus on routine or consistency
New Auto-Interp
Negative Logits
ollen
-0.17
chod
-0.16
guard
-0.15
chine
-0.15
elson
-0.15
jom
-0.15
gun
-0.15
cron
-0.14
Vien
-0.14
oge
-0.14
POSITIVE LOGITS
ities
0.25
ity
0.24
mente
0.23
lah
0.19
s
0.18
ized
0.18
lies
0.18
ly
0.18
ily
0.17
sand
0.17
Activations Density 0.049%