INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    Initially
    0.46
    Table
    0.46
    -;
    0.45
    Windows
    0.43
     Entomology
    0.42
    0.42
    Os
    0.41
    typical
    0.41
    Interestingly
    0.40
    ̬
    0.40
    POSITIVE LOGITS
    '})
    0.48
     saucepan
    0.47
     继续
    0.45
    <unused34>
    0.45
    сез
    0.45
     boc
    0.44
     слу
    0.44
     کار
    0.43
     alfabet
    0.43
     vías
    0.43
    Act Density 0.002%

    No Known Activations