INDEX
Explanations
questions and requests for information or feedback
New Auto-Interp
Head Attr Weights
0:0.06
1:0.03
2:0.06
3:0.23
4:0.05
5:0.13
6:0.01
7:0.11
8:0.02
9:0.02
10:0.22
11:0.02
Negative Logits
kefeller
-1.83
reluctantly
-1.75
llah
-1.71
uez
-1.70
wives
-1.64
slaught
-1.57
uras
-1.55
Fif
-1.55
adulthood
-1.54
).[
-1.51
POSITIVE LOGITS
or
2.04
commenting
2.00
typo
1.92
interesting
1.92
any
1.86
please
1.86
ado
1.83
informative
1.80
profiling
1.79
anything
1.78
Activations Density 0.101%