INDEX
    Explanations

    phrases indicating personal knowledge or familiarity with specific items or concepts

    New Auto-Interp
    Negative Logits
    ikit
    -0.15
    ä¼´
    -0.15
    iki
    -0.15
    ãĥ«ãĤ¯
    -0.15
    rozen
    -0.14
    ipeg
    -0.14
    alama
    -0.14
    irq
    -0.14
    одеÑĢж
    -0.14
     جدا
    -0.13
    POSITIVE LOGITS
     refer
    0.42
     referring
    0.42
     refers
    0.37
     mean
    0.35
     meant
    0.34
    refer
    0.32
     Refer
    0.31
     referred
    0.30
    Mean
    0.30
     REFER
    0.29
    Act Density 0.192%

    No Known Activations