INDEX
Explanations
geographical locations or entities from different regions, potentially related to news or events
geographical locations and proper nouns
New Auto-Interp
Negative Logits
Arlington
-0.99
Allan
-0.98
Fairfax
-0.89
Ambrose
-0.89
Celt
-0.86
Alexandria
-0.84
Unicorn
-0.82
Stall
-0.81
Ung
-0.78
Goldstein
-0.77
POSITIVE LOGITS
j
1.43
ij
1.37
ji
1.25
ja
1.25
J
1.21
jj
1.18
jri
1.14
je
1.14
Js
1.13
jas
1.12
Activations Density 0.355%