INDEX
    Explanations

    statements that indicate existence or presence

    New Auto-Interp
    Negative Logits
    ml
    -0.17
    li
    -0.16
    loo
    -0.16
    uesto
    -0.15
    ynch
    -0.15
    okino
    -0.14
    rest
    -0.14
     Vital
    -0.14
    vection
    -0.14
    ASI
    -0.14
    POSITIVE LOGITS
    amo
    0.25
     trov
    0.20
     può
    0.19
     tro
    0.18
     è
    0.18
     era
    0.18
     inn
    0.18
    oux
    0.17
     diffuse
    0.17
     tratt
    0.17
    Act Density 0.002%

    No Known Activations