INDEX
Explanations
mentions of specific locations, with a focus on cities
mentions of specific geographic locations or entities
New Auto-Interp
Negative Logits
essee
-0.80
barriers
-0.71
ulhu
-0.67
IBLE
-0.66
istics
-0.63
bully
-0.63
Jericho
-0.62
Arkham
-0.62
arche
-0.61
ural
-0.60
POSITIVE LOGITS
stre
0.93
chn
0.88
loo
0.87
bye
0.85
zel
0.85
tsky
0.83
bies
0.83
LM
0.83
nel
0.82
let
0.82
Activations Density 0.049%