INDEX
Explanations
references to cages
mentions of "cage" and its related concepts
New Auto-Interp
Negative Logits
Grad
-0.72
Hur
-0.70
properties
-0.69
appropri
-0.68
hur
-0.67
Redd
-0.67
MLA
-0.66
Relief
-0.66
lly
-0.65
Kot
-0.64
POSITIVE LOGITS
cage
3.90
cages
3.09
Cage
1.61
enclosure
1.58
fence
1.28
crate
1.19
fences
1.08
leash
1.07
feather
1.03
compartment
1.01
Activations Density 0.011%