INDEX
    Explanations

    references to controversial behavior or actions

    New Auto-Interp
    Negative Logits
     AssemblyCompany
    -0.57
    jalá
    -0.57
    明明
    -0.54
    AfterClass
    -0.53
    OGND
    -0.53
    ínica
    -0.52
    ẨM
    -0.50
    AntiForgeryToken
    -0.49
    Xaml
    -0.49
     Chwiliwch
    -0.48
    POSITIVE LOGITS
     sacré
    0.85
     doo
    0.84
    interesting
    0.73
     pretty
    0.72
     interesting
    0.70
     mighty
    0.69
     ouch
    0.68
     lotta
    0.67
     Interesting
    0.66
     somethin
    0.62
    Act Density 0.359%

    No Known Activations