INDEX
Explanations
the use of commas in text
New Auto-Interp
Head Attr Weights
0:0.10
1:0.08
2:0.12
3:0.06
4:0.04
5:0.06
6:0.06
7:0.03
8:0.07
9:0.05
10:0.17
11:0.11
Negative Logits
��
-2.56
newcom
-2.14
Funny
-2.08
Newsp
-2.06
distur
-2.05
censorship
-1.97
stuffing
-1.90
Latest
-1.90
Flavoring
-1.84
Advertisement
-1.84
POSITIVE LOGITS
k
2.63
xi
2.39
lo
2.35
ks
2.34
xs
2.31
lb
2.28
mx
2.23
kb
2.23
gm
2.21
kt
2.21
Activations Density 0.000%