INDEX
Explanations
phrases related to discussion or commentary
New Auto-Interp
Head Attr Weights
0:0.01
1:0.01
2:0.11
3:0.11
4:0.11
5:0.03
6:0.04
7:0.22
8:0.04
9:0.04
10:0.14
11:0.09
Negative Logits
disproportionately
-1.50
nuns
-1.40
anchez
-1.33
qqa
-1.30
iscal
-1.30
traumat
-1.30
missing
-1.28
indicators
-1.28
wrongly
-1.27
underestimated
-1.26
POSITIVE LOGITS
�
1.93
Discuss
1.72
Discuss
1.65
ipedia
1.64
VIDEOS
1.60
Forums
1.54
Fame
1.49
tainment
1.49
airs
1.47
Prix
1.43
Activations Density 0.001%