INDEX
    Explanations

    references to channels or systems of organization

    New Auto-Interp
    Negative Logits
      
    -0.60
    '
    -0.55
    -0.54
       
    -0.52
    -0.49
    -0.48
     tak
    -0.48
    !
    -0.47
     [
    -0.47
    ...
    -0.47
    POSITIVE LOGITS
    ."));
    1.07
    ^(@)
    1.01
     Efq
    0.96
     Theſe
    0.93
    )");
    
    0.91
     Monfieur
    0.89
     $_"
    0.89
    channels
    0.88
     ―――――
    0.85
     myſelf
    0.84
    Act Density 0.429%

    No Known Activations