INDEX
Explanations
instances of complex words or phrases
New Auto-Interp
Head Attr Weights
0:0.03
1:0.02
2:0.08
3:0.29
4:0.03
5:0.03
6:0.08
7:0.11
8:0.04
9:0.12
10:0.05
11:0.08
Negative Logits
NOW
-1.23
flare
-1.18
aber
-1.17
premiums
-1.16
Zeal
-1.14
terness
-1.14
Crimes
-1.12
flares
-1.10
hypoc
-1.10
vP
-1.10
POSITIVE LOGITS
rency
1.52
ciating
1.33
fold
1.29
utive
1.26
odge
1.26
undle
1.25
geon
1.20
ré
1.20
iture
1.14
runners
1.13
Activations Density 0.001%