INDEX
Explanations
verbs related to actions or movements
New Auto-Interp
Head Attr Weights
0:0.08
1:0.08
2:0.09
3:0.09
4:0.08
5:0.08
6:0.05
7:0.08
8:0.07
9:0.09
10:0.08
11:0.08
Negative Logits
paren
-2.53
lich
-2.17
EStream
-2.14
Cosponsors
-2.13
model
-2.08
ntil
-2.08
Piper
-2.07
forcer
-2.04
Buster
-2.03
rompt
-2.02
POSITIVE LOGITS
advancing
2.28
spans
2.25
Released
2.23
archive
2.12
oceans
2.08
streams
2.00
piring
1.99
besie
1.99
channelAvailability
1.99
=
1.99
Activations Density 0.000%