INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    leo
    -0.15
    chten
    -0.15
    anne
    -0.14
    ĩ
    -0.14
    elli
    -0.14
    edition
    -0.14
    ouston
    -0.14
    au
    -0.14
    aison
    -0.14
    verter
    -0.14
    POSITIVE LOGITS
    -Fi
    0.33
    -fi
    0.29
     Fi
    0.25
    wi
    0.22
     fi
    0.21
     fidelity
    0.20
    FI
    0.19
    Fi
    0.19
     wi
    0.18
    Wi
    0.18
    Act Density 0.005%

    No Known Activations