INDEX
Explanations
references to the city of Jerusalem
mentions of the city of Jerusalem
New Auto-Interp
Negative Logits
lass
-0.81
istration
-0.81
lasses
-0.79
ramid
-0.75
ority
-0.74
BOOK
-0.74
ured
-0.73
ascript
-0.72
isher
-0.71
innacle
-0.70
POSITIVE LOGITS
Jerusalem
1.30
usalem
1.25
Galile
0.89
Aviv
0.82
Pradesh
0.81
Canaan
0.79
Beirut
0.77
Judah
0.74
Palestine
0.74
aretz
0.72
Activations Density 0.006%