INDEX
Explanations
the presence of comparison operators or symbols related to conditional statements in code
New Auto-Interp
Negative Logits
'
-0.61
son
-0.59
uniqu
-0.55
-0.55
лю
-0.54
ation
-0.54
ale
-0.52
.
-0.52
lolo
-0.51
lam
-0.51
POSITIVE LOGITS
>
1.85
displayquote
1.58
>>>>>>>>
1.51
$>$
1.50
$>
1.44
(>
1.42
>>>>
1.39
}>\
1.38
>\
1.37
.>
1.36
Activations Density 0.132%