INDEX
    Explanations

    instances of observational conclusions and reported insights

    New Auto-Interp
    Negative Logits
     queſta
    -1.06
    parsedMessage
    -1.05
     パンチラ
    -1.04
    <unused41>
    -1.04
    <unused8>
    -1.03
    <unused3>
    -1.03
    <unused14>
    -1.03
    <unused16>
    -1.03
    <pad>
    -1.03
    [@BOS@]
    -1.03
    POSITIVE LOGITS
     that
    0.83
     which
    0.47
     who
    0.38
     believe
    0.36
     faptul
    0.36
    0
    0.34
    1
    0.33
     nadzieję
    0.33
    2
    0.32
      
    0.31
    Act Density 0.207%

    No Known Activations