INDEX
    Explanations

    discussive phrases introducing opinion or elaboration

    New Auto-Interp
    Negative Logits
    ]<<
    -0.70
    </caption>
    -0.68
    ]<<"
    -0.68
    ]]=
    -0.68
     prostu
    -0.66
    __":
    
    -0.65
     noqa
    -0.63
     Penh
    -0.60
    ÁB
    -0.59
    eseorang
    -0.58
    POSITIVE LOGITS
     yes
    0.91
     why
    0.87
     hey
    0.78
     oh
    0.77
     yet
    0.74
     don
    0.73
     who
    0.73
     therein
    0.72
     then
    0.70
     yeah
    0.70
    Act Density 0.135%

    No Known Activations