INDEX
Explanations
language related to restrictions and limitations
New Auto-Interp
Negative Logits
p
-0.70
n
-0.61
—
-0.59
pearance
-0.56
cob
-0.56
mtext
-0.55
Mein
-0.52
q
-0.52
l
-0.52
sp
-0.51
POSITIVE LOGITS
restrictions
1.32
constraints
1.29
Constraints
1.22
Restrictions
1.20
constraint
1.17
restriction
1.15
restraints
1.14
constraints
1.08
restricting
1.07
bans
1.07
Activations Density 0.339%