INDEX
    Explanations

    phrases related to argumentation and reasoning

    New Auto-Interp
    Negative Logits
    acos
    -0.15
    ukt
    -0.14
    cker
    -0.14
     narr
    -0.14
    athon
    -0.13
    irl
    -0.13
    lov
    -0.13
    Kir
    -0.13
     Gazette
    -0.13
    arResult
    -0.13
    POSITIVE LOGITS
    lider
    0.18
    uce
    0.14
    istrat
    0.14
    ibi
    0.14
    ULE
    0.14
    hc
    0.14
    ohana
    0.13
    656
    0.13
    ê³ł
    0.13
    ogh
    0.13
    Act Density 0.241%

    No Known Activations