INDEX
Explanations
references to physical barriers like fences
mentions of fences
New Auto-Interp
Negative Logits
olute
-0.74
Monetary
-0.73
amus
-0.70
olitan
-0.70
orean
-0.70
alg
-0.69
entials
-0.68
Nir
-0.67
ISTER
-0.66
Lear
-0.64
POSITIVE LOGITS
fence
1.33
fences
1.07
fencing
1.01
encl
0.84
perimeter
0.83
crossings
0.82
guarding
0.82
yard
0.81
wart
0.81
gates
0.80
Activations Density 0.014%