INDEX
    Explanations

    references to surges or increases in context

    New Auto-Interp
    Negative Logits
    f
    -0.18
    orable
    -0.18
    velop
    -0.17
    arih
    -0.17
    lords
    -0.17
    urator
    -0.16
    fuse
    -0.16
    جاÙħ
    -0.16
    \<^
    -0.15
    ryption
    -0.15
    POSITIVE LOGITS
    ging
    0.24
    charges
    0.22
    feit
    0.21
    mount
    0.20
    rog
    0.19
    tÃŃ
    0.19
    ges
    0.19
    r
    0.19
    tir
    0.18
    er
    0.18
    Act Density 0.004%

    No Known Activations