INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     more
    0.99
     people
    0.97
     can
    0.90
     long
    0.89
     support
    0.87
     seemingly
    0.87
     all
    0.86
     we
    0.85
     content
    0.83
     it
    0.83
    POSITIVE LOGITS
    <unused745>
    1.43
    <unused224>
    1.43
    <unused409>
    1.40
    <unused532>
    1.40
    <unused1033>
    1.39
    <unused1908>
    1.35
    1.34
    <unused379>
    1.34
    <unused1987>
    1.33
    <unused1013>
    1.33
    Act Density 0.015%

    No Known Activations