INDEX
Explanations
phrases mentioning locations or directions
references to geographic boundaries or borders
New Auto-Interp
Negative Logits
oric
-0.70
TY
-0.70
BILITIES
-0.69
meg
-0.65
ns
-0.64
ptive
-0.63
ags
-0.61
adem
-0.60
ann
-0.60
agara
-0.59
POSITIVE LOGITS
side
0.87
wagon
0.77
flow
0.73
wagon
0.70
wikipedia
0.68
corridors
0.67
flows
0.67
along
0.67
erous
0.65
iously
0.65
Activations Density 0.018%