INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Negative Logits
    .synthetic
    -0.16
    sth
    -0.15
    ocos
    -0.15
    ivel
    -0.15
    anax
    -0.14
    icus
    -0.14
     Franken
    -0.14
    elin
    -0.14
    amik
    -0.14
    eteria
    -0.14
    POSITIVE LOGITS
    heard
    0.34
     å¤
    0.32
    hea
    0.30
     bead
    0.30
     head
    0.29
    -head
    0.28
     heard
    0.28
    head
    0.28
    ead
    0.26
    _head
    0.26
    Act Density 0.085%

    No Known Activations