INDEX
    Explanations

    words related to misconduct or inappropriate behavior

    variations of the word "behavior."

    New Auto-Interp
    Negative Logits
     Rwanda
    -0.74
     Korean
    -0.71
     Panthers
    -0.69
     Panther
    -0.68
     Nordic
    -0.66
     Panzer
    -0.66
     Purg
    -0.64
    ãģĤ
    -0.64
     Kinnikuman
    -0.64
     Roof
    -0.63
    POSITIVE LOGITS
    beh
    1.45
    aviour
    1.29
     Beh
    0.96
    terness
    0.89
    behavior
    0.89
    avin
    0.89
    Beh
    0.84
    abus
    0.83
     behav
    0.81
    ilib
    0.81
    Act Density 0.007%

    No Known Activations