INDEX
    Explanations

    possessive pronouns and terms related to attribution

    New Auto-Interp
    Negative Logits
    <bos>
    -2.96
     have
    -0.62
     and
    -0.61
    }{||
    -0.61
    ,
    -0.61
    -0.60
    #![
    -0.59
    -0.59
     become
    -0.58
    protected
    -0.58
    POSITIVE LOGITS
     thut
    1.68
     fta
    1.60
     Minang
    1.57
     stockholm
    1.53
     hcm
    1.48
     Juf
    1.48
     bandung
    1.45
     aen
    1.45
     desir
    1.44
     fte
    1.44
    Act Density 0.126%

    No Known Activations