INDEX
Explanations
verbs and phrases indicating potential actions or capabilities
New Auto-Interp
Head Attr Weights
0:0.02
1:0.02
2:0.08
3:0.10
4:0.12
5:0.03
6:0.26
7:0.14
8:0.02
9:0.05
10:0.06
11:0.05
Negative Logits
announcer
-1.35
sovereignty
-1.33
executions
-1.24
AMA
-1.20
optional
-1.20
consultations
-1.19
Valkyrie
-1.18
commentary
-1.18
terday
-1.17
consultation
-1.17
POSITIVE LOGITS
gettable
1.61
ract
1.50
én
1.50
ipher
1.44
VIEW
1.42
hett
1.37
iard
1.37
fy
1.36
anes
1.32
idable
1.30
Activations Density 0.012%