INDEX
    Explanations

    references to "other" entities or categories in various contexts

    New Auto-Interp
    Negative Logits
     itself
    -0.47
    itself
    -0.46
    -0.41
     itſelf
    -0.39
    -0.39
     itulah
    -0.39
     both
    -0.38
    之旅
    -0.35
    something
    -0.35
    PhysRevD
    -0.34
    POSITIVE LOGITS
    worldly
    1.34
     than
    0.98
     niż
    0.88
     similarly
    0.77
     equally
    0.77
     decât
    0.77
     THAN
    0.74
     similar
    0.73
    similar
    0.69
     liknande
    0.68
    Act Density 0.303%

    No Known Activations