INDEX
Explanations
expressions of regret or realizations of past mistakes
New Auto-Interp
Head Attr Weights
0:0.08
1:0.05
2:0.00
3:0.11
4:0.09
5:0.09
6:0.04
7:0.03
8:0.30
9:0.12
10:0.01
11:0.02
Negative Logits
api
-1.64
Alexa
-1.63
pour
-1.52
guide
-1.52
bots
-1.51
Miracle
-1.49
エ
-1.49
virt
-1.49
emouth
-1.47
tower
-1.42
POSITIVE LOGITS
recalled
1.71
recol
1.67
commit
1.67
hindsight
1.67
cringe
1.66
commit
1.66
handwritten
1.65
reminis
1.64
recalling
1.64
rences
1.61
Activations Density 0.001%