INDEX
Explanations
mentions of political figures or events related to a specific region
mentions of a specific term or name
New Auto-Interp
Negative Logits
Wi
-0.70
predetermined
-0.70
cloud
-0.69
spectrum
-0.68
STD
-0.68
microw
-0.65
field
-0.64
Thu
-0.63
bytes
-0.63
scratch
-0.62
POSITIVE LOGITS
ador
4.78
ados
1.21
ado
1.15
ad
1.05
ada
1.05
atorial
1.03
uno
1.02
amura
1.02
istar
1.00
ando
0.98
Activations Density 0.015%