INDEX
Explanations
phrases related to personal identity and character development
New Auto-Interp
Negative Logits
abandonment
-0.14
postpon
-0.13
indifference
-0.13
NEGLIGENCE
-0.13
delaying
-0.13
detached
-0.13
pard
-0.12
exem
-0.12
dismissing
-0.12
invalidate
-0.12
POSITIVE LOGITS
restriction
0.81
restrictions
0.78
restricted
0.77
limitation
0.72
restrict
0.71
limitations
0.71
restrict
0.70
éĻIJåζ
0.70
restriction
0.69
restricted
0.66
Activations Density 0.723%