INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    -0.59
    //
    -0.58
    neurial
    -0.58
     perc
    -0.57
    -0.56
    ----------
    
    -0.56
     traiter
    -0.56
    wort
    -0.55
    rahat
    -0.55
    -0.55
    POSITIVE LOGITS
     Stadium
    3.07
     stadium
    2.80
    Stadium
    2.59
    stadium
    2.50
     stadiums
    2.48
     estadio
    1.61
     stadion
    1.51
     arena
    1.47
     Arena
    1.46
    stadion
    1.44
    Act Density 0.048%

    No Known Activations