INDEX
Explanations
references to "Simple" concepts or frameworks
New Auto-Interp
Negative Logits
eenth
-0.17
402
-0.15
sel
-0.15
ngr
-0.14
leri
-0.14
esimal
-0.14
esco
-0.14
ãĥ³ãĤ°
-0.14
403
-0.14
ub
-0.14
POSITIVE LOGITS
ton
0.39
tons
0.37
xes
0.34
-minded
0.31
TON
0.27
/simple
0.26
minded
0.26
ctic
0.26
st
0.24
/plain
0.23
Activations Density 0.038%