INDEX
Explanations
occurrences of political or legal terms
New Auto-Interp
Head Attr Weights
0:0.05
1:0.36
2:0.04
3:0.05
4:0.04
5:0.04
6:0.13
7:0.04
8:0.04
9:0.09
10:0.03
11:0.03
Negative Logits
fifty
-3.17
forty
-2.98
sixty
-2.93
thirty
-2.78
itely
-2.72
ivably
-2.72
510
-2.56
twenty
-2.56
acebook
-2.52
536
-2.52
POSITIVE LOGITS
main
5.33
main
5.33
Main
4.36
Main
4.24
mainly
3.80
mostly
3.41
mostly
3.19
Mostly
2.96
biggest
2.81
major
2.64
Activations Density 0.002%