INDEX
    Explanations

    avoiding specific verb forms

    New Auto-Interp
    Negative Logits
    人們
    0.51
     Ashanti
    0.48
    0.47
     धूप
    0.44
    ப்பிரிக்க
    0.44
    朋友們
    0.44
    Button
    0.44
     outs
    0.44
     домаћин
    0.44
    maket
    0.44
    POSITIVE LOGITS
    ablo
    0.47
    vede
    0.44
    وف
    0.43
    ef
    0.43
    产生
    0.41
    0.41
     wretched
    0.41
    +
    0.40
     lauf
    0.39
    0.39
    Act Density 0.002%

    No Known Activations