INDEX
Explanations
the key concept of governmental policies
New Auto-Interp
Head Attr Weights
0:0.07
1:0.08
2:0.08
3:0.07
4:0.09
5:0.09
6:0.07
7:0.06
8:0.09
9:0.08
10:0.08
11:0.08
Negative Logits
apters
-2.80
doi
-2.62
onomic
-2.40
γ
-2.37
Martha
-2.34
Trainer
-2.26
longitudinal
-2.20
stereotype
-2.19
atible
-2.17
puberty
-2.17
POSITIVE LOGITS
Peng
2.63
-->
2.59
Geh
2.53
フォ
2.52
|--
2.50
----------------
2.46
Corruption
2.44
swearing
2.43
ettel
2.38
jriwal
2.36
Activations Density 0.000%