INDEX
Explanations
mentions of physical structures or barriers like walls
references to a physical barrier or wall
New Auto-Interp
Negative Logits
Ake
-0.83
Lear
-0.72
invoke
-0.71
registered
-0.71
Lear
-0.68
akh
-0.66
acca
-0.65
served
-0.64
Topics
-0.63
iblings
-0.63
POSITIVE LOGITS
wall
3.68
walls
2.61
wall
2.24
Wall
2.03
Wall
1.88
Walls
1.85
fence
1.67
ceiling
1.66
barrier
1.47
firewall
1.30
Activations Density 0.009%