INDEX
Explanations
mentions of social media accounts or handles
New Auto-Interp
Head Attr Weights
0:0.09
1:0.03
2:0.06
3:0.06
4:0.09
5:0.07
6:0.17
7:0.10
8:0.08
9:0.11
10:0.04
11:0.06
Negative Logits
Mike
-4.16
Mike
-3.92
Manny
-3.89
Michael
-3.76
MJ
-3.72
Saul
-3.72
Michael
-3.56
Marty
-3.25
Pete
-3.25
Marc
-3.18
POSITIVE LOGITS
bottleneck
3.02
barracks
2.99
ADRA
2.82
plantations
2.72
biodiversity
2.72
dracon
2.71
villages
2.71
exting
2.70
forests
2.70
genocide
2.69
Activations Density 0.000%