INDEX
Head Attr Weights
0:0.07
1:0.38
2:0.03
3:0.02
4:0.03
5:0.21
6:0.04
7:0.01
8:0.04
9:0.06
10:0.03
11:0.02
Negative Logits
Enlarge
-2.44
rye
-1.68
Freeze
-1.66
riers
-1.63
toggle
-1.63
UNESCO
-1.61
ICS
-1.61
************
-1.61
Fritz
-1.60
pengu
-1.59
POSITIVE LOGITS
am
2.82
AM
2.81
amer
2.54
ams
2.50
amate
2.37
amac
2.29
amn
2.25
aml
2.24
amic
2.17
ammy
2.13
Activations Density 0.011%