INDEX
Explanations
possessive pronouns indicating ownership or belonging
New Auto-Interp
Negative Logits
raison
-0.15
owied
-0.15
ernals
-0.15
arness
-0.14
ERGY
-0.14
bắt
-0.14
iê
-0.14
baru
-0.14
.statistics
-0.14
درس
-0.14
POSITIVE LOGITS
behalf
0.50
occasions
0.26
occasion
0.26
basis
0.24
heels
0.22
lap
0.20
radar
0.19
doorstep
0.19
shoulders
0.19
dime
0.19
Activations Density 0.040%