INDEX
    Explanations

    negative expressions about respect and opinions in social contexts

    New Auto-Interp
    Negative Logits
    mektedir
    -0.80
    maktadır
    -0.64
    rsiniz
    -0.57
    <eos>
    -0.52
     almendras
    -0.47
     venons
    -0.47
    awaiter
    -0.47
     soggetto
    -0.46
    cination
    -0.45
    美味しかったです
    -0.45
    POSITIVE LOGITS
     Савезне
    0.94
    InputBorder
    0.84
     itſelf
    0.82
    󠁿
    0.81
    новниш
    0.80
    ſelf
    0.78
    :]:
    0.78
     faſt
    0.77
    )':
    0.76
     transfieras
    0.75
    Act Density 0.329%

    No Known Activations