INDEX
Explanations
phrases indicating contradictions or negative statements
New Auto-Interp
Negative Logits
__':
-0.72
ագրություններ
-0.68
endphp
-0.68
]")]
-0.67
>--}}
-0.65
InputBorder
-0.65
```
-0.65
IntoConstraints
-0.63
SequentialGroup
-0.61
"]:
-0.61
POSITIVE LOGITS
(!)
0.88
!!!
0.80
(!)
0.80
?!?
0.78
!!!!!
0.78
freakin
0.77
!!!!!!
0.77
!!!!
0.77
¡¡¡
0.75
!
0.74
Activations Density 0.127%