INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    Rad
    -0.07
     rad
    -0.07
     composed
    -0.07
     LAND
    -0.07
     Rad
    -0.06
     Rud
    -0.06
     dare
    -0.06
     Verd
    -0.06
     Kol
    -0.06
     ded
    -0.06
    POSITIVE LOGITS
     Switch
    0.11
     switch
    0.10
     switches
    0.10
    Switch
    0.10
     switching
    0.09
     SWITCH
    0.08
    .switch
    0.08
    witch
    0.08
    UCH
    0.08
     switched
    0.07
    Act Density 0.012%

    No Known Activations