INDEX
Explanations
phrases related to cognitive biases and logical fallacies
New Auto-Interp
Negative Logits
Canaver
-0.73
outhern
-0.67
Torrent
-0.65
undrum
-0.64
mobi
-0.64
cephal
-0.61
Cosponsors
-0.61
Charity
-0.60
çīĪ
-0.60
Luxem
-0.60
POSITIVE LOGITS
oneself
0.98
)).
0.98
predetermined
0.93
?".
0.89
theirs
0.88
etc
0.83
themselves
0.83
others
0.82
them
0.81
somebody
0.80
Activations Density 7.723%