INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     myſelf
    -0.89
    -0.82
    InjectAttribute
    -0.81
    ftagPool
    -0.81
    fjspx
    -0.76
    ſelves
    -0.76
     passports
    -0.74
     Majefty
    -0.74
     purpoſe
    -0.74
     <=",
    -0.74
    POSITIVE LOGITS
     w
    0.49
      
    0.48
     tri
    0.47
    сов
    0.47
    messer
    0.46
    lowski
    0.46
     over
    0.45
     dit
    0.45
     di
    0.44
    acas
    0.44
    Act Density 0.293%

    No Known Activations