INDEX
Explanations
references to gates or gate-related concepts
references to gates or barriers in various contexts
New Auto-Interp
Negative Logits
ortium
-0.89
enegger
-0.81
lihood
-0.81
Hots
-0.79
ensional
-0.67
issance
-0.65
inho
-0.64
ocker
-0.61
ulous
-0.61
Norm
-0.60
POSITIVE LOGITS
keepers
1.31
keeper
1.27
gates
1.15
ways
1.10
gate
1.10
stones
1.00
posts
0.98
keeping
0.95
house
0.94
hole
0.91
Activations Density 0.007%