INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     pro
    -0.42
    -0.37
     “
    -0.36
    Mucha
    -0.35
     ab
    -0.35
    ורים
    -0.35
     juist
    -0.34
    teryx
    -0.34
    FilterChain
    -0.34
     bio
    -0.34
    POSITIVE LOGITS
    9
    0.90
    <bos>
    0.90
    6
    0.88
    8
    0.88
    7
    0.88
    4
    0.82
     betweenstory
    0.82
    3
    0.80
    5
    0.79
     <>",
    0.79
    Act Density 0.146%

    No Known Activations