INDEX
    Explanations

    phrases signaling intention or preference

    New Auto-Interp
    Negative Logits
    -0.53
    therefore
    -0.52
    λοι
    -0.50
    -0.50
    YesNo
    -0.50
    thus
    -0.49
    tuce
    -0.49
    ösungen
    -0.49
    จึง
    -0.49
     salu
    -0.48
    POSITIVE LOGITS
     nonetheless
    0.80
     nevertheless
    0.73
     trotzdem
    0.71
    それにしても
    0.70
    それでも
    0.66
    $")
    0.65
     still
    0.60
    </tfoot>
    0.60
    0.59
    ")->
    0.56
    Act Density 0.918%

    No Known Activations