INDEX
Explanations
phrases related to essential components or foundational elements
New Auto-Interp
Negative Logits
itious
-0.20
ries
-0.18
arian
-0.17
rik
-0.17
riba
-0.17
orf
-0.16
ether
-0.16
rix
-0.16
ego
-0.15
ints
-0.15
POSITIVE LOGITS
ference
0.27
quisites
0.25
quisite
0.23
clr
0.21
/Core
0.21
lates
0.20
/core
0.20
lla
0.20
.Core
0.20
ll
0.20
Activations Density 0.017%