INDEX
    Explanations

    ggerganov, Romanov, Kasparov

    New Auto-Interp
    Negative Logits
    вля
    0.44
    0.42
    工地
    0.40
    াবার
    0.39
    িনি
    0.39
    DEFGHI
    0.39
     Vaughan
    0.38
    agis
    0.38
     Bürgermeister
    0.38
    ائیگی
    0.38
    POSITIVE LOGITS
    ov
    1.48
    OV
    1.23
    ova
    1.12
    мов
    1.02
    atov
    1.02
    тов
    1.01
    arov
    1.00
    нов
    0.99
    дов
    0.97
    anov
    0.96
    Act Density 0.008%

    No Known Activations