INDEX
    Explanations

    references to foundational principles or frameworks based on empirical data

    New Auto-Interp
    Negative Logits
    iba
    -0.16
    quoise
    -0.15
    µľ
    -0.15
    okino
    -0.14
    ADE
    -0.14
    /Dk
    -0.14
    ɵ
    -0.14
    orelease
    -0.14
    enci
    -0.14
    qua
    -0.13
    POSITIVE LOGITS
     experience
    0.35
     experiences
    0.31
     principles
    0.29
     observations
    0.29
     observation
    0.28
    experience
    0.27
     principle
    0.27
     feedback
    0.26
     input
    0.25
    observations
    0.24
    Act Density 0.307%

    No Known Activations