INDEX
Explanations
recurring patterns in various contexts
New Auto-Interp
Head Attr Weights
0:0.01
1:0.02
2:0.06
3:0.04
4:0.07
5:0.02
6:0.07
7:0.47
8:0.02
9:0.03
10:0.08
11:0.06
Negative Logits
kamp
-1.71
uncond
-1.37
orthy
-1.35
戦
-1.31
Asked
-1.28
leased
-1.26
nai
-1.26
required
-1.26
zai
-1.25
wee
-1.22
POSITIVE LOGITS
patterns
1.83
behavior
1.69
gradient
1.64
behaviour
1.64
Pattern
1.62
pattern
1.62
strand
1.60
Patterns
1.59
behaviors
1.50
destruct
1.49
Activations Density 0.010%