INDEX
Explanations
phrases indicating physical positioning or hierarchy
references to the concept of "beneath" or "underneath."
New Auto-Interp
Negative Logits
yah
-0.87
ordan
-0.78
eln
-0.78
atic
-0.70
ern
-0.67
yrim
-0.67
Preferred
-0.67
andowski
-0.67
ebus
-0.65
agne
-0.65
POSITIVE LOGITS
neath
1.09
eatures
1.07
ĸļ
0.98
pins
0.96
beneath
0.90
ĨĴ
0.86
tremend
0.83
¥ŀ
0.81
layers
0.81
underneath
0.80
Activations Density 0.014%