INDEX
Explanations
phrases indicating positive outcomes or success in various contexts
New Auto-Interp
Negative Logits
Ñħи
-0.18
enton
-0.17
eda
-0.15
Winn
-0.15
eniable
-0.15
ourt
-0.15
leta
-0.15
hence
-0.15
lej
-0.14
enda
-0.14
POSITIVE LOGITS
because
0.35
porque
0.30
because
0.30
Because
0.27
Because
0.27
поÑĤомÑĥ
0.24
karena
0.24
perché
0.23
omdat
0.22
parce
0.22
Activations Density 0.214%