INDEX
Explanations
phrases that indicate significant improvements or advancements
New Auto-Interp
Head Attr Weights
0:0.02
1:0.04
2:0.06
3:0.05
4:0.01
5:0.04
6:0.04
7:0.06
8:0.06
9:0.43
10:0.06
11:0.08
Negative Logits
stress
-1.28
childbirth
-1.26
offic
-1.25
speak
-1.24
chool
-1.23
ago
-1.20
epend
-1.18
ebook
-1.18
velt
-1.17
rom
-1.16
POSITIVE LOGITS
inav
1.41
ADRA
1.37
vil
1.36
rahim
1.33
inness
1.28
inas
1.26
="#
1.25
dividing
1.22
orical
1.21
ZI
1.20
Activations Density 0.041%