INDEX
Explanations
phrases indicating the presence of specific events or actions
levitating words
New Auto-Interp
Negative Logits
-0.58
propOrder
-0.56
مصادر
-0.54
EconPapers
-0.54
ertor
-0.50
awtextra
-0.49
pleaſure
-0.49
codiles
-0.49
ſelf
-0.48
trast
-0.47
POSITIVE LOGITS
schonmal
0.52
gnancy
0.50
possu
0.49
riguardo
0.47
Anyways
0.45
urma
0.43
πως
0.43
Anyways
0.41
kì
0.41
anyways
0.41
Activations Density 0.124%