INDEX
    Explanations

    not, not following, not ideal

    New Auto-Interp
    Negative Logits
     I
    1.10
     
    1.05
     tokamaks
    0.93
     installments
    0.89
    im
    0.87
    و
    0.86
     O
    0.85
     L
    0.85
    เรา
    0.82
    נים
    0.82
    POSITIVE LOGITS
    t
    1.25
    tedir
    1.03
    л
    1.02
    ası
    0.99
     on
    0.99
    д
    0.91
    0.87
    が可能
    0.83
    dır
    0.82
    0.82
    Act Density 0.548%

    No Known Activations