INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    a
    1.66
    ه
    1.54
    1.42
    s
    1.40
    т
    1.36
    0
    1.30
    1.28
    ر
    1.22
    i
    1.21
    ل
    1.21
    POSITIVE LOGITS
     ha
    1.48
     
    1.16
     हा
    1.02
     trein
    0.95
     ха
    0.94
     позволи
    0.94
     przyję
    0.92
     рассказа
    0.89
     밝혔
    0.88
     Ha
    0.88
    Act Density 0.018%

    No Known Activations