INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Highest
    0.71
     최대
    0.70
     Scales
    0.70
     دارای
    0.70
     Amino
    0.69
     cwd
    0.69
    报错
    0.68
     पतला
    0.68
     Rússia
    0.67
     hoge
    0.66
    POSITIVE LOGITS
     metaphor
    0.81
    conscious
    0.77
    aphor
    0.76
     discernment
    0.75
    lern
    0.74
     unsettling
    0.73
    consc
    0.70
     conscious
    0.70
     embell
    0.70
    angled
    0.69
    Act Density 0.022%

    No Known Activations