INDEX
Explanations
phrases related to ensuring things are done correctly or in a specified manner
phrases related to ensuring safety, compliance, and proper functioning in various contexts
New Auto-Interp
Negative Logits
pity
-0.73
unlucky
-0.67
glim
-0.65
famous
-0.63
unwitting
-0.63
bluff
-0.62
Worse
-0.61
wonder
-0.59
whining
-0.59
delusional
-0.58
POSITIVE LOGITS
irrespective
1.02
throughout
1.01
regardless
1.00
wherever
0.91
consistent
0.85
whenever
0.85
across
0.79
whilst
0.78
cellence
0.76
accordingly
0.76
Activations Density 0.458%