INDEX
    Explanations

    references to individuals or groups of people

    New Auto-Interp
    Negative Logits
     Савезне
    -1.02
     myſelf
    -0.81
     Monfieur
    -0.76
    Vidite
    -0.73
     CreateTagHelper
    -0.72
    érience
    -0.71
     ainfi
    -0.71
     ſche
    -0.70
     himſelf
    -0.70
     Anſ
    -0.68
    POSITIVE LOGITS
    它們
    0.86
    它们
    0.80
     its
    0.73
    它们的
    0.72
     it
    0.72
     their
    0.70
    0.64
     它
    0.61
     їх
    0.60
    their
    0.56
    Act Density 0.407%

    No Known Activations