INDEX
Explanations
phrases indicating options for opting out or unsubscribing
New Auto-Interp
Head Attr Weights
0:0.01
1:0.01
2:0.12
3:0.13
4:0.10
5:0.02
6:0.07
7:0.15
8:0.03
9:0.06
10:0.11
11:0.12
Negative Logits
vironments
-1.42
sers
-1.38
Lies
-1.36
abal
-1.34
ゼウス
-1.32
ilts
-1.30
Continent
-1.24
iaries
-1.23
direction
-1.22
igate
-1.22
POSITIVE LOGITS
Flavoring
1.30
*/(
1.28
��
1.28
withdrawal
1.26
subscribing
1.25
occasional
1.24
tresp
1.20
unwanted
1.20
unwelcome
1.19
asar
1.14
Activations Density 0.001%