INDEX
Explanations
words related to physical structures like walls
references to walls and related structures
New Auto-Interp
Negative Logits
Gene
-0.84
phrine
-0.72
lihood
-0.71
Pacific
-0.69
Eug
-0.66
Wild
-0.66
delegated
-0.65
orean
-0.64
MENTS
-0.64
milo
-0.63
POSITIVE LOGITS
papers
1.27
abies
1.19
aby
1.12
paper
1.00
clock
0.97
thickness
0.92
wall
0.89
walls
0.87
decoration
0.86
tops
0.85
Activations Density 0.030%