INDEX
Explanations
terms related to distinctions, relationships, and differences between concepts
New Auto-Interp
Negative Logits
Barg
-0.17
alles
-0.16
Ľi
-0.15
Buchanan
-0.15
avec
-0.15
stor
-0.15
dG
-0.15
olle
-0.15
Bias
-0.15
Bloc
-0.14
POSITIVE LOGITS
bet
0.42
bew
0.37
bet
0.32
bt
0.31
btw
0.29
b
0.27
bw
0.27
Bet
0.24
be
0.24
beet
0.24
Activations Density 0.100%