INDEX
Explanations
references to the concept of punishment
New Auto-Interp
Negative Logits
modular
-0.52
Méndez
-0.52
facade
-0.52
Thacker
-0.52
façade
-0.49
container
-0.49
Estrada
-0.49
underground
-0.49
forbear
-0.49
paranoid
-0.47
POSITIVE LOGITS
punishment
0.54
ScopeManager
0.53
dear
0.51
Subject
0.48
AxisAlignment
0.45
Subjects
0.44
Subjects
0.44
neg
0.42
GroupLayout
0.42
suffers
0.42
Activations Density 0.170%