INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ্রমে
    1.07
     forêts
    1.06
     براي
    1.05
    1.05
     elephants
    1.01
     بي
    1.00
     وي
    1.00
     അക
    1.00
    daughters
    0.99
    листы
    0.98
    POSITIVE LOGITS
    '
    2.08
    :
    1.34
    (
    1.26
    .
    1.22
     loud
    1.06
    ,
    0.96
    I
    0.92
    )
    0.91
    0.90
     louder
    0.89
    Act Density 0.004%

    No Known Activations