INDEX
Explanations
facts or information
assertions of knowledge or certainty
New Auto-Interp
Negative Logits
pex
-0.69
cit
-0.68
nesota
-0.67
isco
-0.64
cific
-0.64
offic
-0.64
coni
-0.64
conservancy
-0.63
ksh
-0.63
oshenko
-0.63
POSITIVE LOGITS
ledged
0.95
ledge
0.92
edge
0.77
nothing
0.76
ourselves
0.75
lege
0.75
how
0.71
LED
0.70
beforehand
0.68
nothing
0.65
Activations Density 0.048%