INDEX
    Explanations

    phrases emphasizing relationships and connections between individuals or groups

    New Auto-Interp
    Negative Logits
     itſelf
    -1.13
     ſtate
    -1.02
     themſelves
    -0.99
     pleaſure
    -0.98
     Jefus
    -0.97
     myſelf
    -0.97
    ſelf
    -0.96
     Majefty
    -0.92
     occaf
    -0.92
     Efq
    -0.90
    POSITIVE LOGITS
     two
    1.10
    two
    1.09
     Two
    1.05
    Two
    1.01
    TWO
    1.00
     TWO
    1.00
     שני
    0.95
     zwei
    0.95
     deux
    0.89
     δύο
    0.89
    Act Density 0.086%

    No Known Activations