INDEX
Explanations
phrases that indicate substantial reasoning or justification for claims
New Auto-Interp
Negative Logits
IsMutable
-0.68
defaultstate
-0.66
apimachinery
-0.60
\}\\
-0.60
stdafx
-0.56
متعلقه
-0.55
WebControls
-0.55
"});
-0.55
fjspx
-0.55
NameInMap
-0.53
POSITIVE LOGITS
legitimate
0.66
legit
0.65
admit
0.64
Autoritní
0.64
valid
0.64
truth
0.63
Honest
0.59
的确
0.58
确实
0.58
VALID
0.58
Activations Density 0.081%