INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     rape
    -0.59
    primarily
    -0.56
    😐
    -0.55
     principally
    -0.54
     sehr
    -0.51
    Generally
    -0.51
    larg
    -0.51
     prostitutes
    -0.50
     Generally
    -0.50
     evidentemente
    -0.49
    POSITIVE LOGITS
     your
    0.90
     beginnetje
    0.85
     '\\;'
    0.76
     MainAxisSize
    0.75
     summertime
    0.73
     unforgettable
    0.73
     springtime
    0.71
     yourself
    0.71
    rungsseite
    0.69
     propOrder
    0.68
    Act Density 0.227%

    No Known Activations