INDEX
Explanations
phrases emphasizing the importance of doing one's best in various contexts
New Auto-Interp
Head Attr Weights
0:0.02
1:0.03
2:0.12
3:0.07
4:0.02
5:0.03
6:0.09
7:0.12
8:0.12
9:0.07
10:0.12
11:0.15
Negative Logits
asus
-1.17
[|
-1.10
vantage
-1.08
rikes
-1.07
raints
-1.07
Topic
-1.07
ould
-1.03
orius
-0.98
umper
-0.97
urus
-0.95
POSITIVE LOGITS
VIDEOS
1.04
vitro
1.00
SERV
0.92
chores
0.91
SOURCE
0.91
selves
0.90
subordinates
0.90
synagogue
0.90
exorc
0.88
rall
0.88
Activations Density 0.004%