INDEX
Explanations
special characters or symbols that may indicate formatting or metadata
New Auto-Interp
Negative Logits
[...]
-0.33
.....
-0.29
....
-0.28
...
-0.27
......
-0.27
[...
-0.27
(...)
-0.26
..."↵
-0.24
...↵
-0.24
..........
-0.24
POSITIVE LOGITS
Gen
0.24
,â̦
0.24
â̦"
0.23
.â̦
0.23
â̦↵↵
0.23
â̦↵
0.23
â̦↵↵↵
0.22
,â̦↵↵
0.21
â̦
0.21
Gen
0.19
Activations Density 0.003%