INDEX
    Explanations

    fractions or ratios

    phrases indicating proportions or ratios

    New Auto-Interp
    Negative Logits
    irez
    -0.61
    andals
    -0.60
    Offline
    -0.59
    ymm
    -0.59
    idel
    -0.56
    isl
    -0.55
    ourgeois
    -0.54
    voice
    -0.54
    hai
    -0.54
    },"
    -0.54
    POSITIVE LOGITS
     every
    0.90
     ten
    0.79
     100
    0.77
     tens
    0.74
     equals
    0.70
     bounds
    0.69
     10
    0.69
     365
    0.68
     435
    0.68
     189
    0.67
    Act Density 0.032%

    No Known Activations