INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     بسیار
    0.46
     очень
    0.43
    极其
    0.43
    !!!!
    0.42
     crucially
    0.40
     மிக
    0.40
    !!!
    0.40
    !!
    0.40
    非常有
    0.40
    Hints
    0.39
    POSITIVE LOGITS
    🤷
    0.88
     shrug
    0.86
     shrugged
    0.84
    仕方
    0.78
     inev
    0.74
     unavoidable
    0.72
     inevitable
    0.69
    Anyway
    0.66
     anyway
    0.64
    まあ
    0.64
    Act Density 0.030%

    No Known Activations