INDEX
Explanations
phrases indicating a decline or downfall
New Auto-Interp
Head Attr Weights
0:0.08
1:0.07
2:0.08
3:0.08
4:0.08
5:0.08
6:0.08
7:0.08
8:0.08
9:0.07
10:0.08
11:0.08
Negative Logits
zai
-2.77
orkshire
-2.76
onian
-2.75
owder
-2.64
eng
-2.62
eton
-2.59
lyn
-2.57
collar
-2.56
grain
-2.52
heating
-2.51
POSITIVE LOGITS
Cait
2.99
Donkey
2.96
Kirby
2.91
Yoshi
2.85
Rogue
2.79
Lucas
2.75
Destination
2.75
Pose
2.70
Fountain
2.67
Skip
2.66
Activations Density 0.000%