INDEX
    Explanations

    comparative phrases indicating a significant increase or degree

    New Auto-Interp
    Negative Logits
    ãĥĨãĥ«
    -0.17
    sworth
    -0.17
    omer
    -0.16
    dep
    -0.15
    rais
    -0.15
    iness
    -0.14
    ebi
    -0.14
    asaki
    -0.14
    _PRI
    -0.14
    spot
    -0.14
    POSITIVE LOGITS
    -grand
    0.21
    åı·
    0.19
     ölçüde
    0.18
    sword
    0.17
    spender
    0.16
    odus
    0.16
    687
    0.16
     dane
    0.15
    ened
    0.15
    atsby
    0.15
    Act Density 0.030%

    No Known Activations