INDEX
    Explanations

    words related to specific details or precise measurements

    phrases indicating clarifications or examples within a discussion

    New Auto-Interp
    Negative Logits
    hess
    -0.66
    ":-
    -0.64
    ciplinary
    -0.64
     (?,
    -0.63
    ourse
    -0.61
    ussions
    -0.60
    ses
    -0.59
    dim
    -0.59
    malink
    -0.56
    ector
    -0.56
    POSITIVE LOGITS
    ardless
    0.88
    )</
    0.75
    !).
    0.75
    ĪĴ
    0.70
     spoiler
    0.68
    udder
    0.67
    ?).
    0.66
    arently
    0.66
     incidentally
    0.65
    ãĢı
    0.63
    Act Density 0.325%

    No Known Activations