INDEX
Explanations
phrases indicating confirmation or assurance of previous statements
New Auto-Interp
Negative Logits
this
-0.18
this
-0.18
éĤ£æł·
-0.15
-0.15
zer
-0.15
è¿Ļä¸Ģ
-0.15
ts
-0.15
.ts
-0.15
thus
-0.15
pilot
-0.14
POSITIVE LOGITS
happen
0.17
happening
0.17
happens
0.17
ìłĢ
0.15
WithTitle
0.15
íĮĶ
0.15
happened
0.15
ãĥ¼ãĥIJ
0.15
Rick
0.14
Them
0.14
Activations Density 0.275%