INDEX
Explanations
phrases indicating a final summary or conclusion
New Auto-Interp
Negative Logits
unk
-0.16
adu
-0.15
.LogWarning
-0.15
IEL
-0.15
648
-0.14
iel
-0.14
avana
-0.14
иÑģ
-0.14
pesan
-0.14
Parr
-0.14
POSITIVE LOGITS
Bottom
0.28
bottom
0.27
bottom
0.26
BOTTOM
0.26
.Bottom
0.25
(bottom
0.24
Bottom
0.24
-bottom
0.22
BOTTOM
0.21
/top
0.20
Activations Density 0.017%