INDEX
Explanations
references to Jerusalem and related geographical or historical terms
New Auto-Interp
Negative Logits
arom
-0.16
ut
-0.16
ond
-0.15
lette
-0.15
oco
-0.14
æ²»
-0.14
astes
-0.14
hiba
-0.14
ono
-0.14
awah
-0.14
POSITIVE LOGITS
ormsg
0.18
gend
0.16
pNet
0.15
ASI
0.14
Void
0.14
aled
0.14
ft
0.13
λοι
0.13
addin
0.13
asi
0.13
Activations Density 0.011%