INDEX
Explanations
references to France and French culture
New Auto-Interp
Negative Logits
jit
-0.20
logg
-0.17
ÅĻe
-0.16
zd
-0.16
stinence
-0.16
عÙĩ
-0.15
eum
-0.15
_formats
-0.15
ohana
-0.15
ensis
-0.15
POSITIVE LOGITS
man
0.37
men
0.35
ies
0.27
woman
0.27
ified
0.27
mans
0.24
spe
0.24
-speaking
0.23
town
0.23
bulld
0.23
Activations Density 0.030%