INDEX
Explanations
words related to physical barriers or boundaries, specifically fences
references to fences
New Auto-Interp
Negative Logits
olute
-0.78
Monetary
-0.72
orean
-0.70
amus
-0.68
izable
-0.67
olitan
-0.67
Nir
-0.67
alg
-0.67
ISTER
-0.66
entials
-0.64
POSITIVE LOGITS
fence
1.32
fences
1.10
fencing
1.01
ModLoader
0.94
encl
0.85
este
0.83
wart
0.80
railing
0.80
perimeter
0.78
guarding
0.77
Activations Density 0.011%