INDEX
Explanations
phrases indicating difficulty or complexity in understanding issues
New Auto-Interp
Negative Logits
punt
-0.15
vel
-0.14
<Service
-0.14
ãĥ³ãĤ¯
-0.14
Äĵ
-0.14
Booth
-0.14
udu
-0.14
kb
-0.14
contra
-0.13
ware
-0.13
POSITIVE LOGITS
simple
0.23
simples
0.20
simplest
0.18
basic
0.18
Simple
0.18
simple
0.17
straightforward
0.17
andel
0.17
/basic
0.17
-simple
0.16
Activations Density 0.145%