INDEX
    Explanations

    numbers with decimal places or LaTeX

    New Auto-Interp
    Negative Logits
    Os
    0.92
    We
    0.86
    Ik
    0.85
    Fr
    0.82
    Just
    0.80
    Jest
    0.79
    Joy
    0.79
    Den
    0.77
    Evalu
    0.76
    Sign
    0.75
    POSITIVE LOGITS
    0.95
    Ϩ
    0.94
    amethasone
    0.93
     высоко
    0.92
     быть
    0.86
    นด์
    0.85
     mannitol
    0.84
    ι
    0.82
    нской
    0.81
     электро
    0.81
    Act Density 0.002%

    No Known Activations