INDEX
Explanations
concepts related to statistics and numerical comparisons
New Auto-Interp
Head Attr Weights
0:0.15
1:0.24
2:0.03
3:0.05
4:0.02
5:0.16
6:0.03
7:0.01
8:0.11
9:0.05
10:0.05
11:0.04
Negative Logits
410
-1.69
standing
-1.66
540
-1.66
stood
-1.65
770
-1.63
660
-1.58
380
-1.53
530
-1.52
940
-1.50
{"-1.50
POSITIVE LOGITS
vs
2.52
=================
2.12
VS
2.11
Vs
2.08
->
2.06
……………………
1.79
Killed
1.79
-=
1.78
的
1.74
Versus
1.72
Activations Density 0.004%