INDEX
Explanations
variations of the word "no"
New Auto-Interp
Negative Logits
ko
-0.17
rech
-0.17
nee
-0.15
neath
-0.15
kin
-0.15
rego
-0.15
ray
-0.15
cour
-0.15
rik
-0.15
rick
-0.14
POSITIVE LOGITS
xious
0.30
longer
0.28
things
0.27
zzle
0.26
matter
0.26
doubt
0.25
venta
0.25
isy
0.25
veau
0.24
ël
0.24
Activations Density 0.043%