INDEX
    Explanations

    words related to uncovering or unveiling information

    occurrences of the word "reveals."

    New Auto-Interp
    Negative Logits
     peaceful
    -0.64
    hare
    -0.62
    Handle
    -0.62
     safe
    -0.60
     automatic
    -0.60
     reserve
    -0.60
    aterasu
    -0.60
     retired
    -0.59
     handle
    -0.57
     blanket
    -0.57
    POSITIVE LOGITS
     reveals
    3.13
     confirms
    2.06
     exposes
    1.71
     shows
    1.68
     demonstrates
    1.66
     tells
    1.63
     suggests
    1.62
     discl
    1.61
     proves
    1.60
     explains
    1.59
    Act Density 0.020%

    No Known Activations