INDEX
Explanations
phrases indicating a clear statement or declaration
phrases emphasizing clarity or making something explicit
New Auto-Interp
Negative Logits
ickle
-0.65
Luck
-0.65
luck
-0.64
gins
-0.62
Derby
-0.62
ools
-0.61
apter
-0.59
Variant
-0.59
lymp
-0.59
otin
-0.58
POSITIVE LOGITS
explicitly
0.84
unamb
0.83
upfront
0.83
unequivocally
0.82
emphatically
0.81
ances
0.78
commitments
0.77
unequiv
0.77
plainly
0.75
distinction
0.74
Activations Density 0.137%