INDEX
    Explanations

    often, typically, generally

    New Auto-Interp
    Negative Logits
    ۔
    0.68
     which
    0.61
     thats
    0.61
    .
    0.61
     andre
    0.60
     zur
    0.59
     восто
    0.57
     ہے۔
    0.56
     victoria
    0.56
     gato
    0.55
    POSITIVE LOGITS
     inherently
    0.80
    往往
    0.78
     সাধারণত
    0.71
     not
    0.68
    not
    0.68
    often
    0.64
     മാത്രമല്ല
    0.63
    基本的に
    0.62
     likely
    0.62
     intrinsically
    0.60
    Act Density 0.009%

    No Known Activations