INDEX
    Explanations

    self-reflection and introspection

    New Auto-Interp
    Negative Logits
     최대한
    0.41
     melindungi
    0.41
    utiliser
    0.40
     Schutz
    0.40
    安心して
    0.39
     tensor
    0.38
     Wille
    0.37
     menghindari
    0.37
    protocols
    0.37
     Android
    0.37
    POSITIVE LOGITS
     introspection
    0.88
     reflexión
    0.82
    introspection
    0.79
     intros
    0.79
     réflexion
    0.70
     reflection
    0.68
    reflection
    0.67
    回顾
    0.66
    Reflection
    0.66
     journaling
    0.64
    Act Density 0.059%

    No Known Activations