INDEX
    Explanations

    phrases that signify self-harm or self-sabotage

    New Auto-Interp
    Negative Logits
     centers
    -0.53
     centres
    -0.53
    centers
    -0.53
    ظر
    -0.43
    驚き
    -0.40
    inale
    -0.39
    addGap
    -0.38
    中心的
    -0.38
    endwhile
    -0.38
    TableHead
    -0.38
    POSITIVE LOGITS
     propOrder
    0.82
    SBATCH
    0.79
    tagHelper
    0.76
    SequentialGroup
    0.73
     autorytatywna
    0.72
     unwittingly
    0.72
     صوتيه
    0.71
     AssemblyTitle
    0.70
     оригіналу
    0.70
     CreateTagHelper
    0.70
    Act Density 0.409%

    No Known Activations