INDEX
    Explanations

    repeated instances of the word "that."

    New Auto-Interp
    Negative Logits
     loop
    -0.58
     Pitts
    -0.52
    otro
    -0.52
    ensk
    -0.51
     BOS
    -0.50
    tagHelperRunner
    -0.50
    stype
    -0.50
     lue
    -0.49
     volks
    -0.48
     instant
    -0.48
    POSITIVE LOGITS
     bahawa
    0.56
     efectivamente
    0.55
    ormais
    0.55
     bahwa
    0.54
     puissent
    0.52
     mães
    0.52
     embarazadas
    0.51
     avulla
    0.50
     kwamba
    0.48
     chociaż
    0.48
    Act Density 0.373%

    No Known Activations