INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    1.98
    פ
    1.94
    1.92
    ве
    1.85
    не
    1.84
    ри
    1.82
    शिप
    1.80
    ра
    1.79
    তে
    1.76
    мо
    1.76
    POSITIVE LOGITS
    t
    2.41
    dates
    2.02
    tweets
    2.00
    tio
    1.99
    etheless
    1.85
    dreams
    1.81
    topics
    1.80
    tion
    1.78
    tips
    1.77
    trials
    1.76
    Act Density 0.014%

    No Known Activations