INDEX
Explanations
references to news articles or press releases
references to the Associated Press (AP) news agency
New Auto-Interp
Negative Logits
ãĤ©
-0.76
Britann
-0.72
Labrador
-0.70
Fenrir
-0.69
Jew
-0.68
Lieberman
-0.68
ãĥīãĥ©ãĤ´ãĥ³
-0.64
true
-0.64
Grimm
-0.62
Labour
-0.62
POSITIVE LOGITS
PLIED
1.23
AP
1.20
PLIC
1.11
olicy
1.08
rison
1.07
ocalyptic
1.06
ocalypse
1.01
apa
1.00
aeda
0.99
olitan
0.99
Activations Density 0.006%