INDEX
Explanations
instances of claims and accusations related to historical narratives and their verification or debunking
New Auto-Interp
Negative Logits
hypoc
-0.16
μβ
-0.15
Unexpected
-0.15
Unexpected
-0.15
unpredict
-0.15
ronic
-0.15
unexpectedly
-0.14
تÙĦ
-0.14
akra
-0.14
umi
-0.14
POSITIVE LOGITS
fiction
0.29
fabrication
0.29
unsupported
0.28
Fabric
0.26
fantasy
0.26
Fiction
0.26
hears
0.25
fanc
0.25
fabric
0.25
Fabric
0.25
Activations Density 0.406%