INDEX
    Explanations

    专家的, 政治, government

    New Auto-Interp
    Negative Logits
    ت
    1.69
    ר
    1.52
    ی
    1.48
    т
    1.47
    UR
    1.40
    י
    1.34
    RE
    1.33
    5
    1.22
    1.22
    ل
    1.19
    POSITIVE LOGITS
    on
    1.34
     (
    1.13
    <0xA8>
    1.02
    0.89
    0.86
     h
    0.86
    h
    0.85
    <0xB5>
    0.84
     från
    0.83
    <0x91>
    0.83
    Act Density 0.000%

    No Known Activations