INDEX
Explanations
phrases indicating a division or categorization of items into two distinct groups
classifications or categories and how they can be divided
New Auto-Interp
Negative Logits
wonders
-0.70
pores
-0.70
daq
-0.65
boards
-0.65
certs
-0.61
board
-0.59
understatement
-0.59
lves
-0.58
rattled
-0.58
everywhere
-0.58
POSITIVE LOGITS
Firstly
1.28
Firstly
0.98
namely
0.89
First
0.85
Either
0.84
Ones
0.77
hemat
0.72
first
0.70
First
0.69
thodox
0.66
Activations Density 0.158%