INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     itſelf
    -0.85
    
    -0.85
     Monfieur
    -0.85
     myſelf
    -0.84
     Theſe
    -0.84
     Efq
    -0.83
     Paglinawan
    -0.83
    SequentialGroup
    -0.82
     themſelves
    -0.80
    <bos>
    -0.79
    POSITIVE LOGITS
    0.48
    ↵↵
    0.45
    TER
    0.45
    Superficie
    0.43
    ense
    0.42
    ter
    0.40
    an
    0.40
    erde
    0.40
    mosa
    0.39
     سرا
    0.39
    Act Density 0.033%

    No Known Activations