INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    𒐪
    0.92
    <unused2121>
    0.91
    <unused2054>
    0.90
    <unused2138>
    0.90
     orthodox
    0.87
     buddhav
    0.86
    0.85
    <unused2142>
    0.84
     SUBSCRIBE
    0.84
    <unused2168>
    0.82
    POSITIVE LOGITS
    ↵↵
    1.86
    ↵↵↵↵
    1.71
    1.68
    ↵↵↵↵↵↵
    1.60
    ↵↵↵↵↵
    1.59
    ↵↵↵
    1.45
    ↵↵↵↵↵↵↵
    1.37
    ↵↵↵↵↵↵↵↵
    1.24
       
    1.24
        
    1.22
    Act Density 1.707%

    No Known Activations