INDEX
Explanations
requests for audience feedback and thoughts
New Auto-Interp
Head Attr Weights
0:0.03
1:0.01
2:0.09
3:0.11
4:0.14
5:0.03
6:0.06
7:0.22
8:0.04
9:0.05
10:0.06
11:0.10
Negative Logits
faked
-1.24
ties
-1.23
suits
-1.22
pred
-1.18
whiff
-1.17
dinosaurs
-1.15
videot
-1.14
vanished
-1.12
satellites
-1.12
��極
-1.12
POSITIVE LOGITS
clarification
1.45
educate
1.42
inquire
1.42
accordingly
1.41
zai
1.40
cautiously
1.40
arse
1.39
opin
1.38
someday
1.38
enthusi
1.38
Activations Density 0.004%