INDEX
Explanations
references to a specific news agency
references to a specific news outlet
New Auto-Interp
Negative Logits
aire
-0.82
ary
-0.80
istics
-0.78
istas
-0.75
mans
-0.71
selves
-0.70
ende
-0.68
acters
-0.68
Thrones
-0.66
ista
-0.66
POSITIVE LOGITS
BILITY
1.30
BLE
1.10
BILITIES
1.03
zza
0.99
ULT
0.95
EA
0.90
ircraft
0.85
xia
0.85
HL
0.84
ccess
0.83
Activations Density 0.020%