INDEX
Explanations
references to physical locations, particularly hills
mentions of hills
New Auto-Interp
Negative Logits
uality
-0.88
ãĥ¯
-0.73
âĹ¼
-0.69
Mach
-0.68
âĸijâĸij
-0.68
ECA
-0.67
~~~~
-0.67
ually
-0.67
Attention
-0.66
Marketable
-0.65
POSITIVE LOGITS
side
1.10
tops
1.05
hill
1.00
frog
0.94
hills
0.93
slopes
0.92
top
0.91
stead
0.84
castle
0.79
hill
0.79
Activations Density 0.013%