INDEX
Explanations
expressions of surprise or astonishment
New Auto-Interp
Head Attr Weights
0:0.05
1:0.05
2:0.18
3:0.12
4:0.03
5:0.03
6:0.14
7:0.11
8:0.06
9:0.05
10:0.06
11:0.07
Negative Logits
fw
-1.55
ctors
-1.41
sembly
-1.41
adies
-1.40
showc
-1.35
traged
-1.32
livest
-1.31
tyr
-1.29
usterity
-1.29
ModLoader
-1.28
POSITIVE LOGITS
mole
1.25
istg
1.16
housed
1.14
crosses
1.11
Moment
1.08
peak
1.08
龍契士
1.06
eta
1.01
Blanc
1.01
Bethlehem
1.00
Activations Density 0.018%