INDEX
Explanations
sentences with declarations or statements of truth
statements affirming a particular truth or reality
New Auto-Interp
Negative Logits
stad
-0.75
assetsadobe
-0.74
rador
-0.71
eor
-0.65
iterranean
-0.62
andise
-0.62
inventoryQuantity
-0.60
throp
-0.60
allic
-0.59
effic
-0.59
POSITIVE LOGITS
that
1.09
THAT
0.98
that
0.77
nobody
0.77
undeniable
0.73
none
0.72
simple
0.71
,
0.70
adays
0.70
we
0.70
Activations Density 0.084%