INDEX
Explanations
phrases emphasizing the concept of weight or burden
New Auto-Interp
Head Attr Weights
0:0.03
1:0.01
2:0.09
3:0.07
4:0.16
5:0.04
6:0.06
7:0.27
8:0.05
9:0.04
10:0.08
11:0.05
Negative Logits
エル
-2.04
ortium
-1.76
AU
-1.71
xxxxxxxx
-1.69
uart
-1.67
theless
-1.67
UTC
-1.65
ipedia
-1.64
KEN
-1.60
SSL
-1.59
POSITIVE LOGITS
rotting
2.36
fumes
2.08
fatigue
1.99
odor
1.89
sickness
1.88
haze
1.86
trembling
1.84
sweat
1.82
bruises
1.80
piles
1.76
Activations Density 0.000%