INDEX
    Explanations

    occurrences of the word "the" in various contexts

    New Auto-Interp
    Head Attr Weights
    0:0.02
    1:0.04
    2:0.10
    3:0.12
    4:0.01
    5:0.02
    6:0.11
    7:0.07
    8:0.11
    9:0.20
    10:0.06
    11:0.10
    Negative Logits
     Deliver
    -1.05
    ーテ
    -1.02
    habi
    -1.01
    Cho
    -0.96
    bour
    -0.96
    ゴン
    -0.96
    legate
    -0.95
    orate
    -0.94
    peer
    -0.91
    egu
    -0.89
    POSITIVE LOGITS
     sake
    2.47
     purposes
    1.98
     reasons
    1.65
    ummies
    1.37
    ulz
    1.32
    icion
    1.30
     foreseeable
    1.27
    erity
    1.22
    izoph
    1.15
    iencies
    1.15
    Act Density 0.078%

    No Known Activations