INDEX
    Explanations

    phrases indicating uncertainty or doubt

    uncertainty about choices or information

    New Auto-Interp
    Negative Logits
     surla
    -0.48
     encar
    -0.45
     BoxFit
    -0.41
    adita
    -0.36
     sibling
    -0.36
     menudo
    -0.35
    miştir
    -0.35
     laid
    -0.35
     frutos
    -0.34
     snatched
    -0.34
    POSITIVE LOGITS
     unsure
    0.81
     Uncertain
    0.69
     Uncertainty
    0.68
     uncertain
    0.68
    Uncertainty
    0.67
    uncertainty
    0.64
     uncertainty
    0.62
     uncertainties
    0.58
     dunno
    0.57
    我不知道
    0.56
    Act Density 0.014%

    No Known Activations