INDEX
Explanations
words indicating probability or uncertainty
New Auto-Interp
Head Attr Weights
0:0.01
1:0.02
2:0.10
3:0.15
4:0.09
5:0.02
6:0.36
7:0.04
8:0.03
9:0.03
10:0.05
11:0.05
Negative Logits
Tud
-1.33
Lines
-1.30
Assistance
-1.27
Sheet
-1.25
cham
-1.24
SpaceEngineers
-1.24
Shot
-1.23
Response
-1.21
CT
-1.18
Armor
-1.17
POSITIVE LOGITS
wiser
1.55
tremend
1.53
EEE
1.50
ividual
1.48
gotten
1.45
xus
1.42
staking
1.41
ACP
1.39
龍
1.37
』
1.35
Activations Density 0.019%