INDEX
Explanations
mentions of the word "French" and related references to France
New Auto-Interp
Negative Logits
jit
-0.17
think
-0.15
jte
-0.15
nbsp
-0.15
rel
-0.15
ded
-0.15
oted
-0.15
reu
-0.14
ationToken
-0.14
readcr
-0.14
POSITIVE LOGITS
-speaking
0.21
man
0.17
ostel
0.15
ysz
0.15
esy
0.15
making
0.14
phone
0.14
IRO
0.14
ake
0.14
men
0.14
Activations Density 0.091%