INDEX
Explanations
words related to authority and power dynamics
historical nouns
New Auto-Interp
Negative Logits
rechter
-0.36
“
-0.31
/
-0.30
-
-0.29
<em>
-0.29
-0.29
fuese
-0.29
...
-0.28
detail
-0.28
(
-0.28
POSITIVE LOGITS
CloseOperation
0.69
ſtate
0.69
RegressionTest
0.69
setVerticalGroup
0.69
ſont
0.68
ſei
0.66
ſind
0.65
newOwner
0.65
ſol
0.65
medriver
0.65
Activations Density 0.012%