INDEX
    Explanations

    comparative and descriptive adjectives

    New Auto-Interp
    Negative Logits
     Belly
    0.43
    belly
    0.42
     belly
    0.41
    MOT
    0.40
    पों
    0.40
     abandonar
    0.40
    동안
    0.39
    0.39
    ベー
    0.38
     মডেল
    0.38
    POSITIVE LOGITS
    üs
    0.41
    </li>
    0.41
     Emir
    0.40
     blasp
    0.39
    0.39
     minggu
    0.38
    ogo
    0.38
    ноп
    0.38
    શ્ર
    0.38
    arik
    0.38
    Act Density 38.564%

    No Known Activations