INDEX
Explanations
references to French-related entities or topics
mentions of the word "French."
New Auto-Interp
Negative Logits
ividual
-0.92
iary
-0.91
ramid
-0.89
ithing
-0.84
regor
-0.84
affles
-0.81
oots
-0.81
ographed
-0.80
razil
-0.79
isSpecialOrderable
-0.79
POSITIVE LOGITS
fries
0.95
Riv
0.88
Connection
0.82
Alps
0.82
antioxid
0.78
wings
0.77
ois
0.77
countryside
0.77
satirical
0.74
men
0.73
Activations Density 0.026%