INDEX
    Explanations

    words related to certainty and emphasis

    expressions that indicate clarification or emphasis, particularly using phrases like "of course" and "in fact."

    New Auto-Interp
    Negative Logits
    prus
    -0.67
     pione
    -0.64
     \'
    -0.62
     oun
    -0.60
    inki
    -0.60
    ategory
    -0.60
    krit
    -0.59
    ailability
    -0.59
    kefeller
    -0.59
     Doodle
    -0.58
    POSITIVE LOGITS
    ,
    1.00
    ,,
    0.88
    ,.
    0.81
    oret
    0.77
    *,
    0.77
    .,
    0.69
    )
    0.67
     anyway
    0.65
    ,-
    0.64
     anyways
    0.62
    Act Density 0.107%

    No Known Activations