INDEX
Explanations
instances of uncertainty or negation
New Auto-Interp
Negative Logits
Houſe
-0.67
ValueStyle
-0.61
houſe
-0.57
preſent
-0.57
purpoſe
-0.55
RegressionTest
-0.54
leſs
-0.54
AssemblyTitle
-0.52
wiſe
-0.52
ſelves
-0.51
POSITIVE LOGITS
need
0.67
have
0.63
get
0.57
DID
0.55
does
0.54
Does
0.54
do
0.53
不
0.52
give
0.52
did
0.52
Activations Density 0.142%