INDEX
    Explanations

    Distillation

    New Auto-Interp
    Negative Logits
     vandal
    -0.10
     editorial
    -0.09
     ....↵↵
    -0.08
     inscr
    -0.08
     beo
    -0.08
     прим
    -0.08
    blu
    -0.08
     approval
    -0.08
     blazer
    -0.08
    批准
    -0.08
    POSITIVE LOGITS
     pip
    0.09
    (pipe
    0.09
     aze
    0.08
     पाइ
    0.08
     pipe
    0.08
     piping
    0.08
     condenser
    0.08
     purge
    0.08
     pipes
    0.08
     dil
    0.08
    Act Density 0.005%

    No Known Activations