INDEX
    Explanations

    phrases that express general awareness or common knowledge

    New Auto-Interp
    Negative Logits
    ลาย
    -0.15
    ë§ī
    -0.15
    Desk
    -0.15
     ãĥ¯
    -0.14
    entrant
    -0.14
    /tiny
    -0.14
    wick
    -0.14
    etting
    -0.14
    racak
    -0.13
    untime
    -0.13
    POSITIVE LOGITS
     know
    0.34
     knows
    0.30
     known
    0.30
    known
    0.28
     Know
    0.27
    know
    0.25
    çŁ¥
    0.25
    -known
    0.25
     çŁ¥
    0.24
    Know
    0.24
    Act Density 0.129%

    No Known Activations