INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    <bos>
    -0.63
     feroit
    -0.61
    InjectAttribute
    -0.60
     ſur
    -0.59
     Minaj
    -0.57
     Chrift
    -0.56
    ftagPool
    -0.54
     himſelf
    -0.54
    mainAxisSize
    -0.54
    Kristin
    -0.53
    POSITIVE LOGITS
    ],
    0.97
    ]$,
    0.69
     ],
    0.69
    [],
    0.68
     [],
    0.64
    },
    0.64
    '],
    0.63
     \%,
    0.62
    .],
    0.60
    ()],
    0.60
    Act Density 0.045%

    No Known Activations