INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     revolt
    -0.71
     infl
    -0.67
     corrections
    -0.66
     bloom
    -0.65
     outweigh
    -0.63
    inately
    -0.63
     prolifer
    -0.63
     insurrection
    -0.63
     unfl
    -0.62
     numer
    -0.62
    POSITIVE LOGITS
    00
    1.92
    30
    1.79
    59
    1.61
    45
    1.61
    15
    1.40
    05
    1.35
    55
    1.35
    50
    1.28
    40
    1.27
    01
    1.26
    Act Density 0.030%

    No Known Activations