INDEX
    Explanations

    instances of the word "it" in varying contexts

    New Auto-Interp
    Negative Logits
    ishly
    -0.17
    haft
    -0.15
    ï
    -0.15
    ulty
    -0.14
    odge
    -0.14
    cout
    -0.14
    sic
    -0.14
    ulture
    -0.14
    iston
    -0.14
    hana
    -0.13
    POSITIVE LOGITS
    iner
    0.41
    chy
    0.31
    /her
    0.29
    /th
    0.28
    zelf
    0.27
    /us
    0.26
    unes
    0.26
    self
    0.23
    inerary
    0.23
    SELF
    0.23
    Act Density 0.184%

    No Known Activations