INDEX
    Explanations

    phrases describing a comparison or certain types of actions

    instances of the word "this" and phrases that denote examples or references

    New Auto-Interp
    Negative Logits
    istries
    -0.90
    Ni
    -0.77
    erate
    -0.70
    half
    -0.68
    sent
    -0.67
    verning
    -0.65
    wa
    -0.65
    Wr
    -0.64
    ãĥ´ãĤ¡
    -0.63
    Fit
    -0.63
    POSITIVE LOGITS
     spoiled
    0.73
     bookmark
    0.65
     improvised
    0.60
     outgoing
    0.60
     tip
    0.60
    agine
    0.59
     sunrise
    0.58
     modifier
    0.57
     guiActive
    0.57
    ragon
    0.57
    Act Density 0.103%

    No Known Activations