INDEX
    Explanations

    phrases related to destruction or harmful actions

    New Auto-Interp
    Negative Logits
    iras
    -0.18
    esta
    -0.17
    .truth
    -0.15
    atal
    -0.15
    oom
    -0.14
     trá»Ŀi
    -0.14
    cot
    -0.14
     turb
    -0.14
    esar
    -0.14
    elier
    -0.14
    POSITIVE LOGITS
    urgeon
    0.16
    essel
    0.16
    pta
    0.15
    vÄĽd
    0.15
    ienne
    0.15
    FileVersion
    0.15
     Lambert
    0.14
    zent
    0.14
    spb
    0.14
     RegexOptions
    0.13
    Act Density 0.026%

    No Known Activations