INDEX
    Explanations

    state of being ON or OFF

    New Auto-Interp
    Negative Logits
    复杂的
    0.31
    複雑
    0.31
    向量
    0.30
    যাপন
    0.29
    เชิง
    0.29
     विनाश
    0.28
    を持つ
    0.28
    اصيل
    0.28
    ོས་
    0.28
     simplification
    0.27
    POSITIVE LOGITS
     ablaze
    0.46
     completely
    0.44
     asleep
    0.44
     unlocked
    0.44
     locked
    0.43
     knocked
    0.43
    completely
    0.43
     awake
    0.42
    处于
    0.41
     primed
    0.41
    Act Density 0.087%

    No Known Activations