INDEX
Explanations
attends to the "trend" tokens from various forms of the word "trend" in different contexts
New Auto-Interp
Head Attr Weights
0:0.59
1:0.01
2:0.02
3:0.02
4:0.22
5:0.03
6:0.01
7:0.05
Negative Logits
myſelf
-0.72
Efq
-0.71
raiſ
-0.68
itſelf
-0.66
ſche
-0.66
poffe
-0.63
purpoſe
-0.63
greateſt
-0.62
ſelves
-0.62
fubject
-0.62
POSITIVE LOGITS
</b>
0.28
Tembelea
0.27
antd
0.26
heures
0.26
みましょう
0.25
g
0.25
Me
0.25
ziale
0.25
t
0.25
</i>
0.25
Activations Density 0.007%