INDEX
    Explanations

    unfair features or changes

    New Auto-Interp
    Negative Logits
    茶叶
    0.38
    電源
    0.37
    ijden
    0.36
     setempat
    0.36
    0.36
     startled
    0.35
     ovip
    0.34
    电源
    0.34
    关节
    0.34
    各自
    0.34
    POSITIVE LOGITS
     botched
    0.55
     unfairly
    0.54
     unfair
    0.50
     fanbase
    0.48
     mandated
    0.45
     revamped
    0.44
     needlessly
    0.44
     outrage
    0.44
     layoffs
    0.44
     hype
    0.43
    Act Density 0.137%

    No Known Activations