INDEX
Explanations
phrases and concepts related to legal definitions and distinctions of physical actions and their moral implications
New Auto-Interp
Negative Logits
undy
-0.15
ebi
-0.15
getattr
-0.13
iola
-0.13
-haspopup
-0.13
Animalia
-0.13
keley
-0.13
ëĤĺ
-0.13
ë¶
-0.12
è¿ĻäºĽ
-0.12
POSITIVE LOGITS
both
0.94
both
0.85
BOTH
0.82
Both
0.72
Both
0.71
både
0.71
_both
0.62
æĹ¢
0.55
ambos
0.52
_BOTH
0.52
Activations Density 0.351%