INDEX
Explanations
references to the city of Paris
New Auto-Interp
Negative Logits
tir
-0.16
recru
-0.16
swire
-0.15
tor
-0.15
orrh
-0.15
vetica
-0.15
halt
-0.15
célib
-0.14
.inflate
-0.14
frau
-0.14
POSITIVE LOGITS
ian
0.35
ienne
0.28
ians
0.27
ien
0.27
Hilton
0.24
IAN
0.23
iens
0.22
cope
0.20
ién
0.19
Match
0.18
Activations Density 0.017%