INDEX
Explanations
phrases related to simplicity and straightforwardness
New Auto-Interp
Negative Logits
ген
-0.15
lers
-0.14
ultimate
-0.14
ngr
-0.14
leri
-0.14
DDL
-0.14
_sink
-0.14
lide
-0.14
ub
-0.13
zel
-0.13
POSITIVE LOGITS
tons
0.32
ton
0.31
/simple
0.30
xes
0.26
/plain
0.23
st
0.22
-minded
0.21
TON
0.21
mente
0.19
-simple
0.19
Activations Density 0.044%