INDEX
    Explanations

    references to HTML elements by their IDs and selectors

    New Auto-Interp
    Negative Logits
    w
    -0.14
    olum
    -0.14
    wares
    -0.13
    al
    -0.13
    ãģĻ
    -0.13
    ep
    -0.13
    ãģ£
    -0.13
    foy
    -0.13
    odo
    -0.13
    zu
    -0.13
    POSITIVE LOGITS
    ovit
    0.16
    alice
    0.15
    olla
    0.15
    sonian
    0.15
    isoft
    0.15
    pmat
    0.15
    brig
    0.15
    ieux
    0.15
     Spo
    0.15
    ÏĦι
    0.14
    Act Density 0.024%

    No Known Activations