INDEX
    Explanations

    instances of the word "rather" indicating preference or comparison

    New Auto-Interp
    Negative Logits
    swer
    -0.18
    sse
    -0.15
    ray
    -0.15
    vla
    -0.15
     Arbor
    -0.15
    ys
    -0.14
    system
    -0.14
    ital
    -0.14
    initely
    -0.14
     barely
    -0.14
    POSITIVE LOGITS
     than
    0.40
    -than
    0.29
    than
    0.28
     THAN
    0.28
    _than
    0.27
    Than
    0.27
     Than
    0.26
     než
    0.24
     än
    0.22
     quam
    0.22
    Act Density 0.016%

    No Known Activations