INDEX
    Explanations

    names of political figures and locations

    New Auto-Interp
    Negative Logits
    etheless
    -0.81
     sight
    -0.70
     directions
    -0.69
    ãģį
    -0.68
     limited
    -0.68
     fringe
    -0.66
     stroke
    -0.65
    cipline
    -0.65
     pinch
    -0.65
    lihood
    -0.65
    POSITIVE LOGITS
    isha
    1.17
    ona
    1.10
    amon
    1.05
    onda
    1.02
    istar
    1.00
    lem
    0.99
    anta
    0.98
    ira
    0.97
    ussia
    0.97
    iba
    0.97
    Act Density 1.277%

    No Known Activations