INDEX
Explanations
the word "fact" and phrases indicating certainty or factual statements
New Auto-Interp
Negative Logits
ULD
-0.15
asser
-0.15
Manson
-0.14
lte
-0.14
ledger
-0.14
se
-0.14
EventListener
-0.14
Ðĭ
-0.13
à¥įà¤Ĺ
-0.13
ams
-0.13
POSITIVE LOGITS
fact
0.20
that
0.20
itious
0.17
bahwa
0.17
avana
0.16
egasus
0.16
585
0.15
Fact
0.15
that
0.14
586
0.14
Activations Density 0.014%