INDEX
Explanations
instances of the word "fact" and related phrases indicating certainty or truth
New Auto-Interp
Negative Logits
اÙģØª
-0.17
else
-0.16
EventListener
-0.15
anded
-0.15
adolu
-0.15
stras
-0.14
theless
-0.14
ZE
-0.13
sea
-0.13
EMPLARY
-0.13
POSITIVE LOGITS
fact
0.22
oring
0.19
um
0.18
ually
0.17
itious
0.17
eur
0.17
icity
0.16
oid
0.16
Fact
0.16
ularity
0.16
Activations Density 0.017%