INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    𝓙
    2.87
    2.75
     busted
    2.71
     profiter
    2.57
     heartbeat
    2.57
    2.49
     scrib
    2.47
     gep
    2.46
    Ки
    2.44
    间的
    2.42
    POSITIVE LOGITS
    ي
    4.82
    י
    3.67
    ి
    3.62
    3.43
    ি
    3.39
    2.94
    ו
    2.88
     việc
    2.79
    2.77
    er
    2.68
    Act Density 0.024%

    No Known Activations