INDEX
Explanations
concepts related to discipline and control
New Auto-Interp
Negative Logits
?url
-0.15
üre
-0.15
æĭľ
-0.14
aphrag
-0.14
nim
-0.14
arih
-0.14
è¨
-0.14
htags
-0.14
trÃŃ
-0.14
ari
-0.13
POSITIVE LOGITS
discipl
0.40
discipline
0.36
punishment
0.34
punishments
0.33
disciplinary
0.33
Discipline
0.33
spanking
0.30
pun
0.28
Pun
0.27
disciplines
0.25
Activations Density 0.280%