INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     апреля
    1.15
    1.01
    wine
    0.97
    alous
    0.96
    jego
    0.95
    edged
    0.93
    exclude
    0.92
     bothering
    0.90
    Ց
    0.90
    Repeat
    0.89
    POSITIVE LOGITS
     attention
    1.84
     homage
    1.74
     tribute
    1.42
     dividends
    1.39
     Attention
    1.38
    attention
    1.28
     heed
    1.26
    laşı
    1.24
     attentions
    1.11
    Attention
    1.10
    Act Density 0.134%

    No Known Activations