INDEX
Explanations
references to the city of Rome
New Auto-Interp
Negative Logits
curricular
-0.53
ZX
-0.52
Gujarati
-0.51
badger
-0.50
QI
-0.50
PSL
-0.50
ulipas
-0.50
DIF
-0.50
Dif
-0.50
DV
-0.50
POSITIVE LOGITS
Rome
2.13
Rome
2.05
ROME
1.63
rome
1.56
ROME
1.13
Roma
0.98
rome
0.96
Roma
0.91
Рим
0.84
罗马
0.81
Activations Density 0.002%