INDEX
Explanations
concepts and phrases related to simplicity
New Auto-Interp
Negative Logits
eenth
-0.17
asco
-0.16
esis
-0.15
402
-0.15
sel
-0.15
ngr
-0.15
rez
-0.14
lio
-0.14
ilities
-0.13
ulet
-0.13
POSITIVE LOGITS
ton
0.41
tons
0.40
xes
0.34
TON
0.29
-minded
0.29
/simple
0.28
minded
0.25
ctic
0.25
straightforward
0.25
/basic
0.24
Activations Density 0.037%