INDEX
Explanations
phrases that emphasize key conclusions or takeaways in discussions or analyses
New Auto-Interp
Negative Logits
zin
-0.15
adu
-0.15
udget
-0.14
IENT
-0.14
van
-0.14
upp
-0.14
Allan
-0.14
upper
-0.14
pen
-0.14
.extensions
-0.14
POSITIVE LOGITS
Bottom
0.24
.Bottom
0.23
bottom
0.23
(bottom
0.22
/top
0.22
line
0.21
BOTTOM
0.21
bottom
0.21
Bottom
0.20
line
0.20
Activations Density 0.016%