INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    
    
    -0.62
     zoll
    -0.57
     alfo
    -0.54
     habang
    -0.54
     confider
    -0.52
     bensin
    -0.51
     ftill
    -0.51
     isoli
    -0.50
    горит
    -0.50
     caufe
    -0.49
    POSITIVE LOGITS
     thank
    1.01
    thank
    0.98
    Thank
    0.97
     Thank
    0.95
     THANK
    0.93
    THANK
    0.85
     thanking
    0.74
    thanks
    0.74
     Thanks
    0.70
     thanks
    0.70
    Act Density 0.029%

    No Known Activations