INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    🩹
    -0.83
    張り
    -0.76
    endete
    -0.75
     风格
    -0.73
    iotensin
    -0.72
    Slime
    -0.71
     鲜
    -0.71
    нных
    -0.70
    Bristol
    -0.69
     Bancroft
    -0.69
    POSITIVE LOGITS
     fire
    3.77
     flames
    3.47
     flame
    3.36
     burning
    2.95
    fire
    2.91
     fires
    2.84
     Fire
    2.81
    Fire
    2.75
    2.69
    flame
    2.67
    Act Density 0.097%

    No Known Activations