INDEX
    Explanations

    phrases related to scientific explanations and reasoning

    New Auto-Interp
    Negative Logits
    verbatim
    -0.16
    nga
    -0.16
    hled
    -0.13
     ?>"/>↵
    -0.13
    keh
    -0.13
    á»Ļc
    -0.13
    eling
    -0.13
    oplan
    -0.13
    arga
    -0.12
    voj
    -0.12
    POSITIVE LOGITS
     explanation
    1.13
     explain
    1.12
     explanations
    1.07
     explaining
    1.05
     explains
    1.02
     explained
    1.00
     Explanation
    0.97
     Explain
    0.92
    explain
    0.91
    Explanation
    0.91
    Act Density 0.288%

    No Known Activations