INDEX
Explanations
phrases related to asserting a statement or claim
instances of the word "assert" and its variations
New Auto-Interp
Negative Logits
Bake
-0.78
Carbuncle
-0.68
ppo
-0.67
mys
-0.63
shows
-0.61
clean
-0.61
Hort
-0.60
nton
-0.60
carb
-0.60
oho
-0.60
POSITIVE LOGITS
iveness
1.02
ively
1.00
antly
0.93
uably
0.93
olated
0.90
ive
0.90
ions
0.90
urances
0.90
ements
0.89
uable
0.89
Activations Density 0.022%