INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    tained
    0.44
               
    0.43
    ↵↵
    0.43
       
    0.42
              
    0.41
           
    0.41
             
    0.41
    (#
    0.41
    candidate
    0.40
    ↵↵↵↵↵↵
    0.40
    POSITIVE LOGITS
     often
    0.86
     oftentimes
    0.81
     notoriously
    0.78
     inherently
    0.75
     spesso
    0.75
    often
    0.74
     často
    0.74
     Often
    0.68
     souvent
    0.68
     usually
    0.66
    Act Density 0.940%

    No Known Activations