INDEX
    Explanations

    common articles and prepositions that indicate locations or relationships

    New Auto-Interp
    Head Attr Weights
    0:0.01
    1:0.01
    2:0.04
    3:0.04
    4:0.15
    5:0.02
    6:0.14
    7:0.35
    8:0.03
    9:0.03
    10:0.06
    11:0.06
    Negative Logits
     wont
    -1.42
    rats
    -1.33
    itsch
    -1.32
     essentials
    -1.30
    ilitarian
    -1.30
     lawy
    -1.30
    avers
    -1.27
    ér
    -1.27
    iferation
    -1.27
    jab
    -1.25
    POSITIVE LOGITS
     fray
    2.01
    phabet
    1.69
     orbit
    1.65
    estamp
    1.51
    ulia
    1.50
     captcha
    1.50
     ranks
    1.49
     Via
    1.47
     Wonderland
    1.38
     Corps
    1.38
    Act Density 0.016%

    No Known Activations