INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     historias
    0.52
     horr
    0.50
     grotes
    0.49
     verhaal
    0.49
     anecd
    0.48
     majest
    0.48
     bravely
    0.47
     punctu
    0.47
     caval
    0.46
     Susie
    0.46
    POSITIVE LOGITS
    <unused642>
    0.71
    <unused411>
    0.64
    <unused1170>
    0.64
    <unused2029>
    0.61
    <unused351>
    0.61
    <unused264>
    0.59
    <unused1917>
    0.59
    <unused201>
    0.58
    <unused272>
    0.58
    <unused352>
    0.57
    Act Density 0.000%

    No Known Activations