INDEX
    Explanations

    entities or names in a text-based setting

    New Auto-Interp
    Negative Logits
    hyde
    -0.93
    weight
    -0.90
    axe
    -0.89
    agne
    -0.87
    ijn
    -0.85
     Fernand
    -0.83
    gran
    -0.82
    mson
    -0.81
    pared
    -0.81
     Kant
    -0.79
    POSITIVE LOGITS
    IOR
    1.17
    idia
    1.10
    ANC
    1.09
    RL
    1.09
    ARA
    1.02
    ERC
    1.00
    CLA
    0.99
    ITE
    0.98
    IZ
    0.98
    vironment
    0.97
    Act Density 0.148%

    No Known Activations