INDEX
    Explanations

    looking for specific details

    New Auto-Interp
    Negative Logits
    ítás
    0.46
     deeper
    0.46
     beyond
    0.45
    积极
    0.42
     seriously
    0.42
    beyond
    0.38
     активно
    0.38
    óság
    0.38
     critically
    0.38
     twice
    0.37
    POSITIVE LOGITS
     Directly
    0.51
    直接
    0.46
     stabilise
    0.45
    Deterministic
    0.44
    directly
    0.43
     直接
    0.43
    主要是
    0.42
     directly
    0.42
     principalement
    0.40
     essentiellement
    0.40
    Act Density 0.009%

    No Known Activations