INDEX
Explanations
the presence of phrases indicating purpose or intent
New Auto-Interp
Head Attr Weights
0:0.01
1:0.02
2:0.09
3:0.18
4:0.01
5:0.03
6:0.05
7:0.10
8:0.06
9:0.22
10:0.07
11:0.10
Negative Logits
idays
-1.21
isen
-1.15
hai
-1.15
zees
-1.13
rite
-1.11
seekers
-1.10
Fever
-1.09
mentioned
-1.09
items
-1.08
parts
-1.08
POSITIVE LOGITS
corrid
1.20
optimal
1.17
LOCK
1.17
Mata
1.15
destro
1.14
ertodd
1.12
unorthodox
1.11
proprietary
1.09
niche
1.06
withd
1.03
Activations Density 0.019%