INDEX
    Explanations

    students, teaching

    New Auto-Interp
    Negative Logits
     Layers
    -0.07
    registered
    -0.06
    _controls
    -0.06
      
    -0.06
    VERIFY
    -0.06
     dan
    -0.06
     Dear
    -0.06
     CCD
    -0.06
     MET
    -0.06
    athroom
    -0.06
    POSITIVE LOGITS
     epoch
    0.08
     heated
    0.07
    -env
    0.06
     сал
    0.06
     roar
    0.06
     Afro
    0.06
     philosophical
    0.06
    ('')↵
    0.06
    plaint
    0.06
     scipy
    0.06
    Act Density 0.052%

    No Known Activations