INDEX
    Explanations

    Removing/sacrificing things

    New Auto-Interp
    Negative Logits
     Sudoku
    -0.07
    -0.06
    economic
    -0.06
    言って
    -0.06
     Kindle
    -0.06
    Tonight
    -0.06
     Geile
    -0.06
    ΩΤ
    -0.06
    -0.06
    .LINE
    -0.06
    POSITIVE LOGITS
     apache
    0.07
     downgrade
    0.07
     creams
    0.07
     chim
    0.07
    .coin
    0.06
    bool
    0.06
     Hoff
    0.06
     persec
    0.06
     Anyone
    0.06
     mean
    0.06
    Act Density 0.069%

    No Known Activations