INDEX
Explanations
numerical data and references to measurements
New Auto-Interp
Negative Logits
inh
-0.14
ober
-0.14
yz
-0.14
aná
-0.13
à¹ĩà¸Ńà¸ģ
-0.13
eness
-0.13
esk
-0.13
otal
-0.13
Marker
-0.13
cunning
-0.12
POSITIVE LOGITS
ayo
0.17
ÙĪØ²
0.14
Dickinson
0.14
ãĢħ
0.13
iger
0.13
fy
0.13
aya
0.13
ico
0.13
mate
0.13
uali
0.13
Activations Density 0.003%