INDEX
    Explanations

    phrases related to assumptions in reasoning or logic

    New Auto-Interp
    Negative Logits
    -0.74
    findpost
    -0.72
    enderror
    -0.69
     EconPapers
    -0.67
    новниш
    -0.66
     &___
    -0.64
    RectangleBorder
    -0.64
    ObjectMeta
    -0.64
    tvguidetime
    -0.64
     विश्वसनीयता
    -0.64
    POSITIVE LOGITS
     original
    0.75
     originais
    0.71
     originals
    0.69
    original
    0.65
     originales
    0.63
    Original
    0.62
     Original
    0.59
    ORIGINAL
    0.59
     ursprünglichen
    0.58
     originale
    0.53
    Act Density 0.080%

    No Known Activations