INDEX
Explanations
mentions of the city Paris in various contexts
New Auto-Interp
Negative Logits
tir
-0.19
tor
-0.17
halt
-0.16
yun
-0.16
amber
-0.16
eer
-0.16
eous
-0.16
dong
-0.15
tings
-0.15
tors
-0.15
POSITIVE LOGITS
ian
0.31
ians
0.23
IAN
0.20
Hilton
0.20
ienne
0.20
cope
0.19
Ø©
0.17
itic
0.17
ien
0.16
ney
0.16
Activations Density 0.008%