INDEX
    Explanations

    wise, reliable, thoughtful

    New Auto-Interp
    Negative Logits
    sei
    0.35
    Β
    0.35
     Bett
    0.34
    <sup>
    0.33
     beetroot
    0.32
     Darth
    0.31
    zant
    0.31
     согласно
    0.31
     creamy
    0.31
     octahedral
    0.31
    POSITIVE LOGITS
    उनके
    0.40
     নজর
    0.36
     njegov
    0.35
    ोल
    0.33
    اویز
    0.33
    rparam
    0.33
    রাজের
    0.33
    0.32
     Onun
    0.32
     Fisher
    0.31
    Act Density 0.016%

    No Known Activations