INDEX
Explanations
phrases related to ethics and standards
phrases related to ethical standards and responsibility
New Auto-Interp
Negative Logits
latent
-0.65
nifty
-0.64
unwitting
-0.62
giant
-0.62
obscure
-0.61
coincidence
-0.61
moot
-0.60
tantal
-0.59
bombs
-0.59
ado
-0.59
POSITIVE LOGITS
regardless
1.11
irrespective
1.06
wherever
0.87
respectfully
0.80
0.78
throughout
0.78
RESP
0.77
whenever
0.76
safegu
0.75
.''.
0.75
Activations Density 0.605%