INDEX
Explanations
references to French culture or items related to France
New Auto-Interp
Negative Logits
à¸ĵ
-0.17
ity
-0.15
ropolis
-0.15
chrome
-0.14
EXPRESS
-0.14
.Charting
-0.14
greg
-0.14
stav
-0.14
otas
-0.14
redd
-0.14
POSITIVE LOGITS
men
0.21
boro
0.21
man
0.19
ified
0.18
-speaking
0.18
Quarter
0.18
spe
0.18
elon
0.18
fries
0.17
Brennan
0.17
Activations Density 0.025%