INDEX
    Explanations

    references to punishment or legal consequences

    New Auto-Interp
    Negative Logits
    -0.60
    Guys
    -0.49
     scene
    -0.49
      
    -0.48
     $
    -0.48
     American
    -0.48
     Guys
    -0.48
     Dean
    -0.47
     guys
    -0.47
     mode
    -0.46
    POSITIVE LOGITS
     autorytatywna
    0.82
    protoimpl
    0.81
    annica
    0.76
     sonno
    0.75
     للاسماء
    0.74
     poichè
    0.72
    imakasih
    0.71
     الرياضيه
    0.71
     '\\;'
    0.70
     للمعارف
    0.70
    Act Density 0.578%

    No Known Activations