INDEX
    Explanations

    terms related to personal experiences and interactions, particularly those reflecting opinions and emotions

    New Auto-Interp
    Negative Logits
     cannot
    -0.77
    Cannot
    -0.74
    cannot
    -0.73
     Cannot
    -0.69
     sahiptir
    -0.54
     אנו
    -0.51
    ですので
    -0.49
    mektedir
    -0.47
    maktadır
    -0.46
     egli
    -0.45
    POSITIVE LOGITS
     isn
    1.59
     aren
    1.51
     shouldn
    1.34
     weren
    1.32
     wasn
    1.31
     hasn
    1.31
     wouldn
    1.29
     didn
    1.26
     doesn
    1.25
    doesn
    1.25
    Act Density 0.484%

    No Known Activations