INDEX
    Explanations

    occurances of the word 'if'

    New Auto-Interp
    Negative Logits
    ORPG
    -0.83
    anon
    -0.78
    IDER
    -0.77
     Flavoring
    -0.75
    Rated
    -0.73
    vantage
    -0.71
    Domin
    -0.69
    ricks
    -0.69
    ESE
    -0.68
    unes
    -0.67
    POSITIVE LOGITS
    yip
    0.72
     imperfect
    0.69
     theoretically
    0.69
     they
    0.69
     technically
    0.68
    plin
    0.68
     materially
    0.68
    fy
    0.66
     aspir
    0.65
     you
    0.65
    Act Density 0.025%

    No Known Activations