INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     obvious
    -0.07
     Fallon
    -0.07
     Friedman
    -0.07
    ?>
    -0.06
     Epstein
    -0.06
     był
    -0.06
    02
    -0.06
    03
    -0.06
     Wong
    -0.06
     bait
    -0.06
    POSITIVE LOGITS
     Terra
    0.13
     Terr
    0.13
     terr
    0.13
     terra
    0.12
    Terr
    0.11
    err
    0.09
     Terry
    0.09
     Terrace
    0.09
    terra
    0.09
     terrace
    0.09
    Act Density 0.006%

    No Known Activations