INDEX
Explanations
instances where factual information or knowledge is stated definitively
assertions and statements of knowledge
New Auto-Interp
Negative Logits
ecycle
-0.63
upgr
-0.60
phrine
-0.60
attery
-0.60
twitch
-0.58
taking
-0.57
ãĤ¦
-0.57
pex
-0.57
ksh
-0.57
inances
-0.57
POSITIVE LOGITS
anecd
1.04
definitively
0.97
unequivocally
0.92
that
0.79
nothing
0.71
darn
0.69
unanimously
0.68
confidently
0.68
instinctively
0.65
nothing
0.64
Activations Density 0.138%