INDEX
Explanations
mentions or considerations of limitations
references to the concept of limitations
New Auto-Interp
Negative Logits
DA
-0.83
psc
-0.76
ron
-0.75
cow
-0.72
Corn
-0.71
patch
-0.71
zo
-0.69
rex
-0.68
eor
-0.67
borne
-0.67
POSITIVE LOGITS
limitations
1.14
limitation
1.10
restrictions
0.90
limits
0.89
limiting
0.89
constraints
0.88
restraints
0.87
loopholes
0.83
restricts
0.81
hind
0.80
Activations Density 0.009%