INDEX
    Explanations

    phrases indicating information or instructions given to others

    instances of the phrase "we were told."

    New Auto-Interp
    Negative Logits
    Labor
    -0.74
    adesh
    -0.68
    ouf
    -0.68
    hur
    -0.65
    cession
    -0.65
    aband
    -0.64
    ent
    -0.63
    aho
    -0.63
    avez
    -0.61
    ivot
    -0.61
    POSITIVE LOGITS
    tale
    0.80
    llor
    0.72
    ariat
    0.71
    ÃĽ
    0.68
     repeatedly
    0.68
     perspect
    0.67
    ļé
    0.67
     proced
    0.65
    Īè
    0.65
     aback
    0.65
    Act Density 0.027%

    No Known Activations