INDEX
Explanations
instances in a transcript where the speaker is reasoning through a problem and vocalizing their thought process
New Auto-Interp
Negative Logits
abouts
-0.07
Seit
-0.07
and
-0.07
اد
-0.06
ná
-0.06
áo
-0.06
åĨ
-0.06
amp
-0.06
ãģ²
-0.06
ide
-0.06
POSITIVE LOGITS
yonel
0.09
.scalablytyped
0.08
-lnd
0.08
галÑĸ
0.07
(Initialized
0.07
_tF
0.07
ÙĪÛĮÙĨت
0.07
GINE
0.07
ÛĮÛĮÙĨ
0.07
↵↵
0.07
Activations Density 0.974%