INDEX
Explanations
phrases related to consequences and the impact of events or actions
New Auto-Interp
Negative Logits
Ñĥж
-0.16
sobie
-0.15
openh
-0.15
@student
-0.15
rand
-0.15
uat
-0.14
hog
-0.14
aucoup
-0.14
illard
-0.14
evi
-0.14
POSITIVE LOGITS
apel
0.16
sep
0.15
.compat
0.15
ména
0.15
sit
0.14
afterward
0.14
dostan
0.14
zik
0.14
Dam
0.14
frem
0.13
Activations Density 1.197%