INDEX
Explanations
instances of greeting phrases or exclamations
New Auto-Interp
Negative Logits
stice
-0.16
rip
-0.15
.uk
-0.15
orian
-0.14
ouis
-0.14
LESS
-0.14
-0.14
amd
-0.14
andbox
-0.13
mdir
-0.13
POSITIVE LOGITS
assed
0.16
prest
0.16
ettle
0.15
arken
0.15
assen
0.15
infr
0.15
AGR
0.15
ullo
0.15
Ler
0.15
ays
0.14
Activations Density 0.016%