INDEX
    Explanations

    phrases that express contrast or exception

    New Auto-Interp
    Negative Logits
    ſelf
    -0.90
     Dorian
    -0.83
     Galicia
    -0.75
     Eros
    -0.75
     Caine
    -0.74
    yayım
    -0.73
     Aire
    -0.73
     PHA
    -0.73
     Jefus
    -0.71
     Jeune
    -0.71
    POSITIVE LOGITS
     but
    2.40
     But
    2.28
    but
    2.19
     BUT
    2.05
    But
    2.05
    BUT
    1.81
     pero
    1.68
     tetapi
    1.49
     nhưng
    1.47
    1.43
    Act Density 0.128%

    No Known Activations