INDEX
    Explanations

    concepts related to self-awareness and personal behavior

    New Auto-Interp
    Negative Logits
    [
    -0.61
    '
    -0.57
    -0.53
     [
    -0.52
    -0.49
     R
    -0.47
    H
    -0.46
    G
    -0.45
    x
    -0.44
    &
    -0.44
    POSITIVE LOGITS
     myſelf
    1.26
     itſelf
    1.21
     Monfieur
    1.17
     Efq
    1.13
     ujednoznacz
    1.10
     themſelves
    1.09
    onViewCreated
    1.09
     autorytatywna
    1.09
     doubtnut
    1.08
     bezeichneter
    1.07
    Act Density 0.239%

    No Known Activations