INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    uncertain
    0.58
    set
    0.52
    name
    0.49
    button
    0.46
    marital
    0.46
    usual
    0.46
    ire
    0.46
    height
    0.45
    worth
    0.44
    family
    0.44
    POSITIVE LOGITS
     solcher
    0.49
     фильм
    0.47
     dessas
    0.46
     філь
    0.45
     фильма
    0.44
    على
    0.43
     ван
    0.43
     eels
    0.43
     миллиар
    0.43
     як
    0.42
    Act Density 0.002%

    No Known Activations