INDEX
Explanations
phrases indicating procedural steps or actions required to achieve a goal
New Auto-Interp
Head Attr Weights
0:0.01
1:0.01
2:0.10
3:0.08
4:0.09
5:0.03
6:0.04
7:0.36
8:0.02
9:0.02
10:0.10
11:0.08
Negative Logits
ividually
-1.59
Cosponsors
-1.59
nces
-1.51
ukong
-1.45
arian
-1.45
elled
-1.44
Favorite
-1.37
ت
-1.36
ebted
-1.34
unct
-1.33
POSITIVE LOGITS
Exit
1.84
diligence
1.60
reconciliation
1.59
unification
1.55
advancement
1.55
roadmap
1.53
emancipation
1.53
refinement
1.52
easing
1.51
refining
1.50
Activations Density 0.013%