INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    )".
    0.51
    них
    0.46
    0.45
    0.45
    0.44
     церковь
    0.43
     Phenomenology
    0.43
    нами
    0.42
    س
    0.42
    یان
    0.41
    POSITIVE LOGITS
    in
    0.86
    e
    0.74
    im
    0.72
    ik
    0.71
    at
    0.70
    as
    0.63
    il
    0.61
    ah
    0.59
    a
    0.59
    or
    0.58
    Act Density 0.128%

    No Known Activations