INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    1.30
    н
    1.05
    1.03
    ف
    1.00
    0.99
    0.96
    к
    0.94
    м
    0.93
    م
    0.91
    0.91
    POSITIVE LOGITS
    at
    0.95
    for
    0.92
     {
    0.88
    দের
    0.86
     for
    0.84
    s
    0.78
     while
    0.77
    arı
    0.76
     {.
    0.76
    0.73
    Act Density 0.011%

    No Known Activations