INDEX
    Explanations

    terms related to evaluation or comparison of experiences

    New Auto-Interp
    Negative Logits
    iaux
    -0.16
    undi
    -0.15
    каж
    -0.14
    Ĺi
    -0.14
    appiness
    -0.14
    olik
    -0.14
    taÅŁ
    -0.13
    ½Ķ
    -0.13
    loff
    -0.13
    gn
    -0.13
    POSITIVE LOGITS
     ever
    1.37
    ever
    1.05
    -ever
    1.04
     EVER
    0.99
     Ever
    0.95
    Ever
    0.90
     jamais
    0.61
    EVER
    0.60
    soever
    0.44
     Everett
    0.42
    Act Density 0.197%

    No Known Activations