INDEX
    Explanations

    phrases indicating expectations or predictions regarding outcomes

    New Auto-Interp
    Negative Logits
    -FIRST
    -0.14
    wat
    -0.12
    urette
    -0.12
    ниÑĤ
    -0.12
    analyze
    -0.12
    vise
    -0.12
    çĸ²
    -0.12
    trys
    -0.12
    ynchronize
    -0.12
    atta
    -0.12
    POSITIVE LOGITS
     grace
    0.26
     headline
    0.23
     top
    0.20
     duke
    0.19
     land
    0.19
     rank
    0.19
     rival
    0.19
     rule
    0.18
     outs
    0.18
     command
    0.18
    Act Density 0.207%

    No Known Activations