INDEX
    Explanations

    verbs indicating movement or action

    New Auto-Interp
    Head Attr Weights
    0:0.08
    1:0.09
    2:0.09
    3:0.08
    4:0.08
    5:0.07
    6:0.07
    7:0.07
    8:0.07
    9:0.08
    10:0.08
    11:0.08
    Negative Logits
     Explain
    -2.18
    eur
    -2.13
    oided
    -2.11
    ullivan
    -2.11
     Lawyers
    -2.10
    authorized
    -2.08
    ioned
    -1.97
    endas
    -1.96
    ayson
    -1.94
    inge
    -1.92
    POSITIVE LOGITS
     Jou
    2.24
    Afee
    2.12
     gradient
    1.98
    Band
    1.98
    ドラ
    1.95
     proport
    1.92
    Tes
    1.91
    perty
    1.91
     spacing
    1.91
     tens
    1.88
    Act Density 0.000%

    No Known Activations