INDEX
Explanations
instances of the word "Sure" with a strong activation
affirmative phrases that express certainty
New Auto-Interp
Negative Logits
tnc
-0.69
humane
-0.68
foreseen
-0.66
andom
-0.60
mercial
-0.60
mone
-0.59
uese
-0.59
resil
-0.59
rights
-0.58
utenberg
-0.57
POSITIVE LOGITS
ndra
0.99
ty
0.82
enough
0.80
entimes
0.78
fire
0.70
terday
0.70
footed
0.68
tack
0.67
ties
0.65
tt
0.65
Activations Density 0.017%