INDEX
    Explanations

    concepts and definitions

    New Auto-Interp
    Negative Logits
     gebruiken
    0.49
     használ
    0.47
     comical
    0.47
     essayé
    0.46
     halaman
    0.46
     goldfish
    0.46
     manually
    0.45
     automóviles
    0.45
     empfe
    0.45
     automóvil
    0.45
    POSITIVE LOGITS
     Advocacy
    0.47
    理解
    0.45
    🧠
    0.44
     homeostasis
    0.43
     развитие
    0.43
    Learning
    0.43
     развитию
    0.43
    Social
    0.42
     नैतिकता
    0.40
    isering
    0.40
    Act Density 0.387%

    No Known Activations