INDEX
    Explanations

    proper nouns such as names of people, places, organizations, and titles

    New Auto-Interp
    Negative Logits
     yoda
    -0.93
     pixar
    -0.84
     soeur
    -0.81
     monstre
    -0.81
     pikachu
    -0.80
     gardien
    -0.78
     😭😭
    -0.76
     Mère
    -0.76
     broderie
    -0.75
     bieber
    -0.74
    POSITIVE LOGITS
     makro
    0.78
    0.74
     Fö
    0.73
     Nö
    0.72
     ideolog
    0.72
     saar
    0.70
    Fakta
    0.69
     Jä
    0.69
     alkoh
    0.69
     Schrö
    0.68
    Act Density 0.416%

    No Known Activations