INDEX
Explanations
instances of negation or contradiction in statements
New Auto-Interp
Negative Logits
Gro
-0.15
aily
-0.14
åĶĩ
-0.14
ERVE
-0.14
ICA
-0.14
ursive
-0.14
ÙĬب
-0.14
agle
-0.14
ashtra
-0.14
Gro
-0.14
POSITIVE LOGITS
unde
0.17
apa
0.15
trace
0.14
scopes
0.14
UIApplicationDelegate
0.14
bir
0.14
696
0.14
Trace
0.14
burgh
0.14
anza
0.14
Activations Density 0.002%