INDEX
    Explanations

    questions suggesting uncertainty or lack of knowledge

    the phrase "who knows" and variations

    New Auto-Interp
    Negative Logits
    ciating
    -0.86
    herent
    -0.74
    ItemTracker
    -0.69
    aten
    -0.67
    Rated
    -0.65
    cially
    -0.64
    lies
    -0.63
    inance
    -0.63
    esthesia
    -0.63
    issance
    -0.62
    POSITIVE LOGITS
     fri
    0.64
     scen
    0.63
     how
    0.62
    ROR
    0.61
    rium
    0.59
     Kitt
    0.59
    amorph
    0.59
     sew
    0.58
     fuzz
    0.58
     sunshine
    0.58
    Act Density 0.047%

    No Known Activations