INDEX
Explanations
concepts related to evaluation and critique of ideas or products
New Auto-Interp
Negative Logits
â̦â̦â̦â̦
-0.26
â̦↵↵
-0.20
â̦â̦â̦â̦â̦â̦â̦â̦
-0.19
â̦.
-0.19
..↵↵
-0.18
.
-0.18
â̦â̦
-0.16
â̦↵
-0.15
âĨĴ↵↵
-0.15
!!}
-0.15
POSITIVE LOGITS
...
0.54
)...
0.49
...↵
0.47
"...
0.42
...\
0.38
...'
0.38
...↵↵
0.38
...]
0.38
..."
0.37
..."↵
0.37
Activations Density 0.173%