INDEX
    Explanations

    phrases indicating transitions or changes in states or conditions

    New Auto-Interp
    Negative Logits
    329
    -0.17
    istogram
    -0.15
    onu
    -0.14
    zzo
    -0.14
    652
    -0.13
    onden
    -0.13
    æĸ¹éĿ¢
    -0.13
    achu
    -0.13
    ws
    -0.13
    affen
    -0.13
    POSITIVE LOGITS
     being
    0.28
     merely
    0.23
    being
    0.22
     mere
    0.22
     Being
    0.20
    被
    0.20
     strength
    0.19
    mere
    0.19
     zero
    0.19
    strength
    0.19
    Act Density 0.098%

    No Known Activations