INDEX
    Explanations

    phrases related to assertions or claims

    New Auto-Interp
    Negative Logits
    aný
    -0.15
     gave
    -0.15
    راÙĩ
    -0.14
    Reviewed
    -0.14
    egot
    -0.14
    empor
    -0.14
    rve
    -0.14
    ming
    -0.13
    rina
    -0.13
    åΰäºĨ
    -0.13
    POSITIVE LOGITS
     widely
    0.26
     learned
    0.26
     established
    0.25
     reported
    0.24
     understood
    0.24
     believed
    0.23
     sur
    0.23
     known
    0.23
    bel
    0.23
    wid
    0.22
    Act Density 0.105%

    No Known Activations