INDEX
    Explanations

    concepts related to philosophical discussions about power and transformation

    New Auto-Interp
    Negative Logits
    lg
    -0.16
    wers
    -0.16
    awai
    -0.15
    sep
    -0.15
    enberg
    -0.15
    upy
    -0.14
    irit
    -0.14
    ando
    -0.14
    hausen
    -0.14
    anken
    -0.13
    POSITIVE LOGITS
     itself
    0.63
     unto
    0.58
     alone
    0.40
     themselves
    0.39
     Alone
    0.29
    alone
    0.29
     herself
    0.28
     Ñģами
    0.27
     Ñģама
    0.27
     induction
    0.25
    Act Density 0.080%

    No Known Activations