INDEX
Explanations
words related to physical structures, specifically walls
references to physical barriers or structural components
New Auto-Interp
Negative Logits
Published
-0.72
Gene
-0.71
avez
-0.69
heny
-0.68
ya
-0.66
Wild
-0.65
phrine
-0.65
EVA
-0.65
Pacific
-0.65
PubMed
-0.64
POSITIVE LOGITS
walls
1.31
papers
1.00
Walls
0.97
wall
0.94
aby
0.90
abies
0.88
ceilings
0.78
paper
0.78
paintings
0.77
eries
0.76
Activations Density 0.011%