INDEX
Explanations
unique characters or symbols in text
New Auto-Interp
Negative Logits
('-0.43
–
-0.38
(
-0.35
'
-0.33
(~
-0.30
(«
-0.29
‘
-0.28
("-0.28
'[
-0.28
'(
-0.26
POSITIVE LOGITS
-----↵
0.27
----↵
0.26
——
0.25
-↵
0.23
—I
0.23
--↵
0.23
—↵↵
0.22
—↵
0.21
---↵
0.20
------↵
0.19
Activations Density 0.011%