INDEX
    Explanations

    cult or cult-like discussions

    New Auto-Interp
    Negative Logits
    ле
    1.10
    ни
    1.01
    ла
    1.01
    сах
    0.96
    il
    0.95
    лама
    0.91
    ра
    0.87
    лили
    0.86
     Сасик
    0.84
    ිය
    0.83
    POSITIVE LOGITS
    t
    1.24
    0
    1.15
    Cult
    1.09
        
    0.98
    \
    0.95
    1
    0.94
     Cult
    0.93
    ]
    0.93
    0.91
     on
    0.91
    Act Density 0.002%

    No Known Activations