INDEX
    Explanations

    the word "one" in various contexts

    New Auto-Interp
    Negative Logits
    lications
    -1.84
    ))?
    -1.80
    "))
    -1.74
    ))\
    -1.58
    terday
    -1.55
    '?"
    -1.55
    )))
    -1.54
    )$)
    -1.53
    "?
    -1.52
    ))=
    -1.51
    POSITIVE LOGITS
    1.50
                                                                      
    1.50
                                                                                                                                                                                                                                                                    
    1.50
    č↵                       
    1.50
    1.50
    1.50
    1.50
                                                    
    1.50
    1.50
    <|outofrange|>
    1.50
    Act Density 0.152%

    No Known Activations