INDEX
Explanations
locations or facilities associated with transportation or large gatherings
places associated with public infrastructure and facilities
New Auto-Interp
Negative Logits
XT
-0.70
ILA
-0.63
teness
-0.63
ETH
-0.62
CHAT
-0.62
maxwell
-0.59
anan
-0.59
\\\\\\\\\\\\\\\\
-0.59
onom
-0.58
ilogy
-0.57
POSITIVE LOGITS
hops
1.33
chool
1.21
hips
1.11
frequ
0.97
hare
0.97
paces
0.96
mith
0.94
pec
0.89
chwitz
0.88
kitchens
0.85
Activations Density 0.298%