INDEX
Explanations
phrases related to personal curiosity or contemplation
expressions of curiosity and personal introspection
New Auto-Interp
Negative Logits
seiz
-0.70
Bung
-0.63
squats
-0.62
+++
-0.62
metic
-0.61
-+-+
-0.61
misunder
-0.61
\'
-0.61
ensu
-0.57
Pros
-0.57
POSITIVE LOGITS
ãĤ¤ãĥĪ
0.68
idge
0.67
likewise
0.66
ern
0.65
istani
0.65
èĥ
0.65
éĥ
0.64
nostic
0.63
erville
0.63
erning
0.62
Activations Density 1.365%