INDEX
    Explanations

    expressions of time and long-term plans or events

    New Auto-Interp
    Negative Logits
     himself
    -1.52
    ament
    -1.52
    eners
    -1.38
    eu
    -1.32
    himself
    -1.29
    kin
    -1.24
    aya
    -1.22
     Episode
    -1.22
    lan
    -1.20
     hostage
    -1.19
    POSITIVE LOGITS
    2.11
                                                   
    2.11
    č↵  
    2.11
    <|outofrange|>
    2.11
    <|outofrange|>
    2.11
    2.11
               
    2.11
    2.11
    č↵       
    2.11
    č↵        
    2.11
    Act Density 2.940%

    No Known Activations