INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    Info
    -0.07
    EOS
    -0.07
    reen
    -0.07
     core
    -0.07
     kernel
    -0.06
     strides
    -0.06
     legends
    -0.06
    Pro
    -0.06
    -dot
    -0.06
    (column
    -0.06
    POSITIVE LOGITS
     punishment
    0.14
     punish
    0.13
     punished
    0.12
     punishments
    0.10
     punishing
    0.09
     Pun
    0.08
     yasak
    0.08
     rewarded
    0.07
     наказ
    0.07
     punishable
    0.07
    Act Density 0.006%

    No Known Activations