INDEX
    Explanations

    patterns related to mathematical or logical symbols

    New Auto-Interp
    Negative Logits
     itſelf
    -1.00
    -1.00
     виправивши
    -0.95
    AntiForgeryToken
    -0.94
    المناصب
    -0.93
     estimés
    -0.93
     Мексичка
    -0.92
    -------------</
    -0.90
    Geplaatst
    -0.90
    kloped
    -0.89
    POSITIVE LOGITS
     >>
    0.80
    >>
    0.79
    [toxicity=0]
    0.60
    ##
    0.50
     bullet
    0.50
    <i>
    0.48
    ched
    0.48
    ode
    0.47
    >>>>
    0.47
    bullet
    0.46
    Act Density 0.147%

    No Known Activations