INDEX
    Explanations

    prepositions indicating location or placement in relation to objects

    New Auto-Interp
    Negative Logits
    edited
    -0.64
     digest
    -0.62
    aired
    -0.60
    period
    -0.59
    hy
    -0.57
     abbrevi
    -0.56
     staggered
    -0.55
    oreal
    -0.55
    utenberg
    -0.55
     karma
    -0.55
    POSITIVE LOGITS
    DonaldTrump
    0.82
    erous
    0.80
    tops
    0.72
    ibaba
    0.70
    sers
    0.70
    slaught
    0.70
    lie
    0.69
    yx
    0.69
    btn
    0.69
    sie
    0.66
    Act Density 0.115%

    No Known Activations