INDEX
    Explanations

    references to personality traits and self-reflection

    New Auto-Interp
    Negative Logits
     zusammen
    -0.14
    ÙĪÙĬÙĦ
    -0.14
     Primitive
    -0.13
     جÙĦ
    -0.13
     unrelated
    -0.13
    illion
    -0.13
    illon
    -0.13
    olas
    -0.12
    ÑĩиÑħ
    -0.12
    uster
    -0.12
    POSITIVE LOGITS
     amb
    0.41
     ambiguity
    0.34
     mixed
    0.33
     ambiguous
    0.33
     oscill
    0.32
     undecided
    0.32
     split
    0.31
     neither
    0.31
     gray
    0.31
    mixed
    0.30
    Act Density 0.488%

    No Known Activations