INDEX
Explanations
references to a specific team or organization
New Auto-Interp
Head Attr Weights
0:0.02
1:0.01
2:0.09
3:0.07
4:0.07
5:0.03
6:0.09
7:0.31
8:0.03
9:0.05
10:0.13
11:0.05
Negative Logits
ogram
-1.58
Likes
-1.56
forgiveness
-1.54
wives
-1.54
netflix
-1.51
Vine
-1.51
Ruin
-1.48
Laf
-1.45
Nile
-1.44
Ame
-1.42
POSITIVE LOGITS
Panel
1.72
maneu
1.65
earable
1.61
iHUD
1.58
propulsion
1.57
cockpit
1.55
Climate
1.53
ENTION
1.53
cffffcc
1.52
dimensional
1.51
Activations Density 0.000%