INDEX
Explanations
references to usefulness and practicality in creating or discussing concepts and mechanics
New Auto-Interp
Negative Logits
?↵↵
-0.18
:**
-0.17
!!!↵↵
-0.17
:↵↵↵
-0.17
:↵↵
-0.17
:č↵č↵
-0.17
:↵↵↵↵
-0.16
??↵↵
-0.16
???↵↵
-0.16
?”↵↵
-0.15
POSITIVE LOGITS
!
0.54
?
0.50
!");↵
0.45
!↵
0.45
?↵
0.41
!"
0.41
!↵↵
0.40
!!
0.38
?",
0.37
?↵↵
0.36
Activations Density 0.070%