INDEX
    Explanations

    references to mice and related experimental conditions

    New Auto-Interp
    Negative Logits
    Ļ
    -2.92
    ı
    -2.87
    Į
    -2.73
    ĸ´
    -2.70
    Ľ
    -2.67
    ħ
    -2.65
    »
    -2.64
    ¨
    -2.63
    ĻĤ
    -2.61
    Ħ
    -2.58
    POSITIVE LOGITS
     endif
    1.70
     </
    1.55
     truth
    1.40
     &\
    1.36
     tactic
    1.36
     Lie
    1.36
     intuition
    1.34
     Algebra
    1.31
     commut
    1.30
    ======
    1.29
    Act Density 0.202%

    No Known Activations