INDEX
    Explanations

    ultimately leading to outcomes

    New Auto-Interp
    Negative Logits
    ل
    0.77
    ون
    0.75
    on
    0.71
    е
    0.68
    ag
    0.66
    and
    0.66
    на
    0.64
    ın
    0.62
    ری
    0.61
     będzie
    0.60
    POSITIVE LOGITS
     damals
    0.61
     तत्कालीन
    0.60
     তৎকালীন
    0.59
     হয়েছিল
    0.56
     తన
    0.56
     zerstört
    0.55
     
    0.55
     Beit
    0.54
     convirtió
    0.54
     إلى
    0.53
    Act Density 0.024%

    No Known Activations