INDEX
    Explanations

    root mean square

    New Auto-Interp
    Negative Logits
     bias
    -0.08
    -0.07
     gradient
    -0.06
     motiv
    -0.06
    stad
    -0.06
    تش
    -0.06
     tuần
    -0.06
    де
    -0.06
     депут
    -0.06
     часа
    -0.06
    POSITIVE LOGITS
     rms
    0.07
    anticipated
    0.07
    rc
    0.07
     Rc
    0.06
    lobber
    0.06
    	rc
    0.06
    ocu
    0.06
    packing
    0.06
    .prompt
    0.06
    binding
    0.06
    Act Density 0.003%

    No Known Activations