INDEX
    Explanations

    Chinese, Spanish, and English prefixes

    New Auto-Interp
    Negative Logits
    MUX
    0.38
    regarding
    0.37
     lâm
    0.37
     দমন
    0.36
    有關
    0.36
    ರಿಗೆ
    0.35
     concerning
    0.35
    ureshi
    0.34
    0.34
     Exempt
    0.34
    POSITIVE LOGITS
    一下
    0.50
    Vice
    0.44
    0.43
    वस्था
    0.39
     Stable
    0.38
    看一下
    0.38
    Under
    0.38
    vice
    0.38
    跑步
    0.38
    Chair
    0.37
    Act Density 0.004%

    No Known Activations