INDEX
Explanations
references to beliefs or conclusions reached about certain situations or arguments
New Auto-Interp
Negative Logits
oses
-0.74
pec
-0.73
rack
-0.71
alysed
-0.69
ax
-0.69
uno
-0.68
leen
-0.67
eal
-0.66
aukee
-0.66
ified
-0.65
POSITIVE LOGITS
there
0.99
although
0.94
neither
0.91
many
0.87
unlike
0.86
none
0.82
these
0.81
they
0.81
nobody
0.79
"[
0.77
Activations Density 0.244%