INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    She
    1.11
    That
    0.99
    Pass
    0.91
    unun
    0.90
    Way
    0.87
    Definition
    0.86
    Okay
    0.86
    Durch
    0.86
    Is
    0.85
    When
    0.85
    POSITIVE LOGITS
     e
    1.58
     E
    1.14
     see
    1.05
     unlikely
    0.96
     improve
    0.96
     helpful
    0.95
     highly
    0.94
     emerge
    0.92
     enhance
    0.91
     eg
    0.90
    Act Density 0.100%

    No Known Activations