INDEX
Explanations
sequences of dashes or similar repeated characters
New Auto-Interp
Negative Logits
„
-0.66
}">
-0.56
„
-0.55
⁃
-0.55
]")]
-0.52
(„
-0.52
◊
-0.49
}));
-0.49
"}>
-0.49
[…]
-0.48
POSITIVE LOGITS
--
1.39
--
1.27
'--
1.12
.--
1.12
:--
1.09
"--
1.06
)--
1.05
('--1.04
#--
1.02
!--
1.02
Activations Density 0.283%