INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ffic
    -0.78
    yout
    -0.77
    sembly
    -0.77
    qqa
    -0.72
    Effective
    -0.71
    iery
    -0.70
    olphins
    -0.70
    ramid
    -0.69
    conservancy
    -0.68
    uminati
    -0.67
    POSITIVE LOGITS
    's
    0.95
     Weasley
    0.88
     realizes
    0.87
     knows
    0.86
     remembers
    0.86
     survived
    0.82
     alone
    0.82
     knew
    0.80
     herself
    0.80
     disappeared
    0.79
    Act Density 0.125%

    No Known Activations