INDEX
    Explanations

    this neuron is looking for instances where something is almost but not entirely fitting or meeting expectations

    phrases indicating uncertainty or hesitation

    New Auto-Interp
    Negative Logits
    olan
    -0.84
    uments
    -0.82
    selage
    -0.73
    cius
    -0.72
     DRAG
    -0.67
    runtime
    -0.65
    ERAL
    -0.64
    ogi
    -0.64
    lessness
    -0.64
    rys
    -0.62
    POSITIVE LOGITS
    icable
    0.85
     bothered
    0.73
     Enough
    0.72
     spo
    0.72
     shy
    0.69
    Enough
    0.69
    theless
    0.68
     spoon
    0.67
     enough
    0.67
     reunited
    0.65
    Act Density 0.013%

    No Known Activations