INDEX
Explanations
words related to bricks
references to the word "brick."
New Auto-Interp
Negative Logits
ntil
-0.85
ktop
-0.80
Voters
-0.71
judicial
-0.70
selection
-0.69
EVA
-0.68
ittee
-0.65
uters
-0.65
aeda
-0.65
ccording
-0.63
POSITIVE LOGITS
bats
1.27
bricks
1.04
layer
1.01
brick
1.00
Brick
0.96
mort
0.92
shaw
0.91
yard
0.88
works
0.87
buster
0.84
Activations Density 0.006%