INDEX
    Explanations

    cowardly, cowardice, or coward

    New Auto-Interp
    Negative Logits
    .
    0.94
     be
    0.83
    ور
    0.80
    ro
    0.77
    نا
    0.77
    res
    0.75
    ل
    0.72
    لر
    0.71
    س
    0.71
    ق
    0.70
    POSITIVE LOGITS
    y
    0.84
    ]:
    0.69
    findpost
    0.68
    0.67
    𝘢
    0.66
    o
    0.65
    IdleSync
    0.65
     como
    0.64
    ați
    0.64
    0.64
    Act Density 0.001%

    No Known Activations