INDEX
Explanations
code-related syntax and expressions, particularly those involving conditionals and variable checks
New Auto-Interp
Negative Logits
!'
-0.50
!’
-0.46
!}
-0.45
!”
-0.43
!',
-0.43
2
-0.43
!",
-0.41
!<
-0.41
!!!
-0.41
!!!"
-0.40
POSITIVE LOGITS
(((
0.60
((
0.59
((*
0.58
(__
0.57
(*
0.56
(_
0.55
([]
0.53
((&
0.53
(!__
0.49
(*(
0.49
Activations Density 0.146%