INDEX
Explanations
references to truth and the concept of truthfulness
New Auto-Interp
Negative Logits
ilon
-0.15
oo
-0.14
list
-0.14
PLAN
-0.14
apia
-0.14
erno
-0.14
ita
-0.14
utton
-0.14
Edition
-0.14
igan
-0.13
POSITIVE LOGITS
fully
0.27
fulness
0.22
/false
0.19
.djangoproject
0.18
ãģªãĤĭ
0.17
ylene
0.16
flen
0.16
enticator
0.15
soever
0.15
yntax
0.15
Activations Density 0.035%