INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ophone
    -0.74
     ancest
    -0.67
    enza
    -0.64
     arose
    -0.62
    avorite
    -0.61
    igon
    -0.61
     arisen
    -0.61
     Danger
    -0.60
    plane
    -0.60
    eways
    -0.60
    POSITIVE LOGITS
     02
    0.77
    01
    0.77
    jun
    0.76
     08
    0.75
    1027
    0.74
    28
    0.73
    09
    0.73
     04
    0.72
     09
    0.72
    08
    0.72
    Act Density 0.015%

    No Known Activations