INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ди
    0.91
    ри
    0.79
     bothering
    0.75
    王者
    0.75
    ксе
    0.71
     murderous
    0.70
     unquestionably
    0.69
     complaining
    0.68
    مر
    0.68
     murdering
    0.68
    POSITIVE LOGITS
    f
    0.93
     maan
    0.84
    ic
    0.84
    ay
    0.81
    andet
    0.80
     nahe
    0.79
    holen
    0.79
    and
    0.79
    ta
    0.79
    iation
    0.79
    Act Density 0.000%

    No Known Activations