INDEX
Explanations
references to French culture and cuisine
New Auto-Interp
Negative Logits
Houſe
-0.80
ainfi
-0.79
Inscrivez
-0.77
Reſ
-0.77
guisement
-0.76
ंदीखरीदारी
-0.75
Eſ
-0.73
Cæsar
-0.71
Anſ
-0.71
Jérusalem
-0.70
POSITIVE LOGITS
French
1.27
France
1.16
French
1.05
french
1.04
Paris
1.02
France
0.97
フランス
0.97
француз
0.93
FRENCH
0.90
法国
0.89
Activations Density 0.639%