INDEX
    Explanations

    phrases indicative of notable actions or characteristics

    New Auto-Interp
    Negative Logits
    FINE
    -0.17
    à¥įरह
    -0.15
    lander
    -0.14
    raham
    -0.14
    Overrides
    -0.14
    ãĤ¯ãĥĪ
    -0.14
     ifndef
    -0.13
    aspers
    -0.13
     Prev
    -0.13
    elf
    -0.13
    POSITIVE LOGITS
     little
    0.40
    little
    0.36
     Little
    0.36
    Little
    0.35
    ITTLE
    0.27
    ittle
    0.24
     poco
    0.23
     peu
    0.22
     pouco
    0.20
    ittel
    0.18
    Act Density 0.020%

    No Known Activations