INDEX
Explanations
words related to barriers or obstacles
references to gates and gatekeepers, implying control or access points
New Auto-Interp
Negative Logits
Hots
-0.85
ensional
-0.73
yles
-0.70
arus
-0.69
issance
-0.68
ortium
-0.67
Norm
-0.66
ity
-0.65
urrent
-0.62
itarian
-0.61
POSITIVE LOGITS
keepers
1.38
keeper
1.37
ways
1.21
keeping
1.07
fold
1.06
posts
0.98
way
0.96
stones
0.92
hole
0.90
house
0.85
Activations Density 0.032%