INDEX
Explanations
dialogue interactions and character responses
New Auto-Interp
Head Attr Weights
0:0.14
1:0.02
2:0.08
3:0.10
4:0.10
5:0.05
6:0.05
7:0.06
8:0.07
9:0.12
10:0.07
11:0.09
Negative Logits
conservancy
-1.93
DragonMagazine
-1.84
️
-1.67
™:
-1.52
`,
-1.45
elig
-1.43
experien
-1.42
externalActionCode
-1.39
ixel
-1.39
Downloadha
-1.35
POSITIVE LOGITS
reply
1.42
replied
1.39
oward
1.22
exploded
1.22
explodes
1.21
Downs
1.21
acker
1.19
angrily
1.18
responds
1.18
Rap
1.17
Activations Density 0.009%