INDEX
Explanations
words related to instructions, guidance, or procedural contexts
New Auto-Interp
Head Attr Weights
0:0.03
1:0.02
2:0.02
3:0.04
4:0.05
5:0.05
6:0.03
7:0.06
8:0.01
9:0.07
10:0.02
11:0.54
Negative Logits
Flavoring
-2.38
Priv
-1.97
prem
-1.96
cest
-1.95
Rating
-1.95
Quant
-1.92
Property
-1.92
things
-1.91
Bris
-1.90
ック
-1.88
POSITIVE LOGITS
directions
3.80
footsteps
3.48
steps
3.36
path
3.25
route
3.24
paths
3.15
trajectory
3.01
trend
2.97
traject
2.96
playbook
2.92
Activations Density 0.074%