INDEX
Explanations
phrases related to news headlines
uppercase letter sequences that suggest names or places
New Auto-Interp
Negative Logits
Diaz
-0.65
Franks
-0.65
iors
-0.63
selves
-0.61
kson
-0.61
fe
-0.59
Germans
-0.57
stood
-0.56
self
-0.55
bushes
-0.55
POSITIVE LOGITS
MAN
1.22
AN
1.14
ANS
1.14
LAND
1.13
COL
1.13
EN
1.13
CLAIM
1.12
VIEW
1.12
VER
1.10
VILLE
1.10
Activations Density 0.075%