INDEX
Explanations
phrases that indicate quantities or numerical relationships
New Auto-Interp
Head Attr Weights
0:0.02
1:0.02
2:0.07
3:0.05
4:0.27
5:0.03
6:0.04
7:0.18
8:0.03
9:0.04
10:0.09
11:0.10
Negative Logits
comings
-1.47
ciation
-1.41
availability
-1.40
success
-1.37
deployments
-1.33
Addiction
-1.26
corruption
-1.26
ERY
-1.25
EStream
-1.25
amphetamine
-1.25
POSITIVE LOGITS
quished
1.62
stakeholders
1.53
rians
1.42
supervisors
1.39
subsidiaries
1.39
volunteers
1.38
intermedi
1.36
scrut
1.36
bystanders
1.36
shareholders
1.35
Activations Density 0.001%