INDEX
    Explanations

    instances of knowledge and awareness in various contexts

    New Auto-Interp
    Negative Logits
    igo
    -0.18
    alars
    -0.15
    antro
    -0.15
    anca
    -0.14
    acades
    -0.14
    /or
    -0.14
    ieux
    -0.14
    wizard
    -0.14
    Ñħи
    -0.14
    imat
    -0.13
    POSITIVE LOGITS
    -how
    0.16
    uckle
    0.16
    rf
    0.15
    upp
    0.14
    æĤī
    0.14
    ession
    0.14
    zia
    0.14
    arth
    0.14
    akk
    0.13
    ORTH
    0.13
    Act Density 0.100%

    No Known Activations