INDEX
Explanations
numerical codes or identifiers
New Auto-Interp
Negative Logits
UTH
-0.18
IR
-0.17
ITY
-0.17
USR
-0.16
ULT
-0.16
UP
-0.16
ITS
-0.15
ULO
-0.15
THR
-0.15
[][]
-0.15
POSITIVE LOGITS
CB
0.22
AB
0.22
DE
0.22
BE
0.22
EB
0.21
FB
0.21
B
0.21
FE
0.20
DB
0.20
DE
0.20
Activations Density 0.013%