INDEX
    Explanations

    terms that indicate self-reflection and introspection

    New Auto-Interp
    Negative Logits
     يتيمه
    -0.72
     ejus
    -0.72
     whor
    -0.70
     coisa
    -0.70
    робнее
    -0.69
     Dumas
    -0.69
    paravant
    -0.69
    covariance
    -0.68
     оригіналу
    -0.67
     brancas
    -0.67
    POSITIVE LOGITS
     reflected
    2.28
     reflecting
    2.26
     reflection
    2.25
     reflect
    2.24
     reflections
    2.16
     Reflect
    2.16
     reflects
    2.16
    reflect
    2.05
     refle
    1.97
     Reflection
    1.92
    Act Density 0.071%

    No Known Activations