INDEX
    Explanations

    This neuron activates on words expressing uncertainty or unpredictability (e.g., “uncertain,” “uncertainty,” “unpredictable”).

    New Auto-Interp
    Negative Logits
    445
    -0.07
     Loy
    -0.07
    vou
    -0.07
     вор
    -0.07
    "display
    -0.07
    296
    -0.06
     Loves
    -0.06
     بإ
    -0.06
    Shown
    -0.06
    appName
    -0.06
    POSITIVE LOGITS
     uncertainty
    0.13
     uncertain
    0.11
     uncertainties
    0.10
     cy
    0.08
     tomorrow
    0.07
    ypical
    0.07
     uncert
    0.07
    urray
    0.07
     doubt
    0.07
    erty
    0.07
    Act Density 0.009%

    No Known Activations