INDEX
    Explanations

    articles and prepositions indicating context

    New Auto-Interp
    Head Attr Weights
    0:0.02
    1:0.01
    2:0.10
    3:0.07
    4:0.20
    5:0.05
    6:0.06
    7:0.18
    8:0.05
    9:0.05
    10:0.08
    11:0.07
    Negative Logits
     transitions
    -1.76
     interacted
    -1.60
    humans
    -1.57
     interacts
    -1.55
     simulated
    -1.52
    duction
    -1.51
    instein
    -1.48
     processes
    -1.48
    gov
    -1.44
    sequence
    -1.43
    POSITIVE LOGITS
    $$$$
    1.81
     cheers
    1.64
    ウス
    1.60
     Mechdragon
    1.54
     Vanity
    1.49
    使
    1.47
    1.47
    HY
    1.46
     nickname
    1.45
     Gund
    1.43
    Act Density 0.000%

    No Known Activations