INDEX
    Explanations

    instances of comparisons suggesting a decrease or reduction in magnitude

    phrases indicating a comparison or measure of quantity

    New Auto-Interp
    Negative Logits
    TRY
    -0.69
     Reconstruction
    -0.66
     Origins
    -0.66
    DD
    -0.61
    AE
    -0.61
     den
    -0.61
    âĹ¼
    -0.60
    kamp
    -0.57
    DK
    -0.57
    RL
    -0.57
    POSITIVE LOGITS
    ened
    0.98
     than
    0.96
    ening
    0.87
    thumbnails
    0.82
     Than
    0.75
    ainers
    0.72
    expensive
    0.72
    ons
    0.70
    ensive
    0.69
     fortunate
    0.69
    Act Density 0.032%

    No Known Activations