INDEX
    Explanations

    asking questions with is this/that

    New Auto-Interp
    Negative Logits
     గత
    0.64
     теркәлү
    0.60
     길이
    0.59
     sulfon
    0.58
     двух
    0.57
    ljenje
    0.55
     तबाद
    0.54
     Stuttgart
    0.54
     étale
    0.54
     የመ
    0.53
    POSITIVE LOGITS
    i
    0.70
    ي
    0.69
    ن
    0.66
    ت
    0.64
    ↵↵
    0.64
    ک
    0.64
    ك
    0.63
    0.61
    n
    0.60
    AR
    0.60
    Act Density 0.118%

    No Known Activations