INDEX
    Explanations

    expressions related to introspection and self-reflection

    New Auto-Interp
    Negative Logits
    enta
    -0.15
    redits
    -0.15
    iel
    -0.14
    ίνη
    -0.13
    eson
    -0.13
    IGHT
    -0.13
    635
    -0.13
    isque
    -0.13
    leton
    -0.13
    913
    -0.13
    POSITIVE LOGITS
     it
    0.54
    å®ĥ
    0.37
     It
    0.35
    	it
    0.34
    It
    0.34
    _it
    0.32
     nó
    0.31
     itu
    0.27
    it
    0.26
    ,it
    0.26
    Act Density 0.444%

    No Known Activations