INDEX
Explanations
phrases and contexts emphasizing the importance of facts and factual accuracy
New Auto-Interp
Negative Logits
dings
-0.16
trace
-0.15
ati
-0.15
§
-0.14
/fw
-0.14
ÏĢÏĮ
-0.14
atic
-0.14
loo
-0.14
Silk
-0.14
Leer
-0.13
POSITIVE LOGITS
itious
0.23
ually
0.22
facts
0.19
facts
0.18
fully
0.18
ìĤ¬íķŃ
0.17
ãĥ³ãĤº
0.17
oring
0.17
fulness
0.16
nel
0.16
Activations Density 0.026%