INDEX
Explanations
statements and discussions emphasizing objective facts and their verification
New Auto-Interp
Negative Logits
eb
-0.15
innen
-0.15
ded
-0.15
iggs
-0.15
dy
-0.14
fine
-0.14
itz
-0.14
ocker
-0.14
tle
-0.14
.idx
-0.14
POSITIVE LOGITS
facts
0.23
facts
0.20
itious
0.18
intl
0.18
urtle
0.17
Fact
0.16
Facts
0.16
aland
0.16
oring
0.16
gravity
0.16
Activations Density 0.026%