INDEX
Explanations
quantitative data and numerical statistics related to research findings
New Auto-Interp
Negative Logits
Shib
-0.16
wi
-0.16
sympath
-0.15
Inactive
-0.15
esti
-0.14
List
-0.14
616
-0.13
list
-0.13
Ips
-0.13
assort
-0.13
POSITIVE LOGITS
abbo
0.16
ckt
0.15
oin
0.15
rint
0.14
askell
0.14
lean
0.14
pairs
0.14
ìĬµ
0.14
ripp
0.14
ãĤ¹ãĥ¬
0.14
Activations Density 0.051%