INDEX
Explanations
occurrences of the term "French" and relevant numerical relationships to it
New Auto-Interp
Negative Logits
jit
-0.16
adem
-0.16
.Abstractions
-0.15
ÙĬرا
-0.14
nors
-0.14
greg
-0.14
stav
-0.14
ÑģÑĤÑĢов
-0.14
ÅĻe
-0.14
ity
-0.14
POSITIVE LOGITS
man
0.28
men
0.28
boro
0.24
ified
0.23
-speaking
0.22
Quarter
0.22
fries
0.21
spe
0.21
mans
0.20
bulld
0.20
Activations Density 0.022%