INDEX
    Explanations

    phrases that express the concept of novelty or transformation

    New Auto-Interp
    Negative Logits
    recent
    -0.15
    lenÃŃ
    -0.14
    Latest
    -0.14
    remen
    -0.14
     recent
    -0.13
     güncel
    -0.13
    azı
    -0.13
    Äįer
    -0.13
     Ñĥда
    -0.13
    §
    -0.13
    POSITIVE LOGITS
     whole
    0.78
    whole
    0.68
     entirely
    0.61
    Whole
    0.60
     Whole
    0.60
     altogether
    0.56
     entire
    0.54
     Entire
    0.49
     completely
    0.44
     caÅĤ
    0.39
    Act Density 0.163%

    No Known Activations