INDEX
    Explanations

    phrases referring to physical actions or interactions

    prepositions and phrases indicating relationships or positions

    New Auto-Interp
    Negative Logits
     substituted
    -0.67
    fried
    -0.65
     replaced
    -0.63
    orno
    -0.59
     Jarrett
    -0.59
     rewritten
    -0.57
    ghai
    -0.57
     Laksh
    -0.56
    FIG
    -0.56
     accompanied
    -0.56
    POSITIVE LOGITS
    ventory
    0.72
     whom
    0.71
    oba
    0.67
     afar
    0.65
    zeb
    0.65
    insky
    0.64
     deed
    0.64
    xs
    0.63
     advoc
    0.60
    sword
    0.58
    Act Density 0.445%

    No Known Activations