INDEX
    Explanations

    elements indicating boolean states or flags in configurations

    New Auto-Interp
    Negative Logits
    lore
    -0.18
    ierce
    -0.16
    lot
    -0.16
    lor
    -0.16
    din
    -0.16
    icks
    -0.16
    odon
    -0.15
    ri
    -0.15
    esson
    -0.15
    ow
    -0.15
    POSITIVE LOGITS
    /false
    0.22
    ushima
    0.15
    assen
    0.15
    vais
    0.15
    hetic
    0.14
    izoph
    0.14
    oplast
    0.14
    ongs
    0.14
    andid
    0.14
    kommen
    0.14
    Act Density 0.038%

    No Known Activations