INDEX
Explanations
phrases that prompt critical thinking or reflection
New Auto-Interp
Negative Logits
Hacker
-0.15
.named
-0.14
extr
-0.14
̧
-0.14
ighton
-0.14
idal
-0.14
yen
-0.14
ellaneous
-0.13
aphrag
-0.13
ches
-0.13
POSITIVE LOGITS
åIJ§
0.17
McL
0.14
AMESPACE
0.14
(++
0.14
yourself
0.14
.poly
0.13
though
0.13
tout
0.13
Escort
0.13
asan
0.13
Activations Density 0.064%