INDEX
    Explanations

    articles or definite articles in the text

    New Auto-Interp
    Head Attr Weights
    0:0.07
    1:0.08
    2:0.08
    3:0.08
    4:0.07
    5:0.07
    6:0.08
    7:0.07
    8:0.07
    9:0.10
    10:0.09
    11:0.08
    Negative Logits
    rahim
    -2.42
    ワン
    -2.34
    ullivan
    -2.25
    イト
    -2.22
     裏�
    -2.17
    igl
    -2.16
    ategories
    -2.14
    hari
    -2.10
    PLA
    -2.09
    akedown
    -2.07
    POSITIVE LOGITS
     froze
    2.22
     revoked
    2.18
     skipped
    2.10
     discount
    2.06
     emitted
    2.00
     trailed
    2.00
     crunch
    1.98
     drifted
    1.97
     tend
    1.97
     tended
    1.95
    Act Density 0.000%

    No Known Activations