INDEX
Explanations
specific words related to professional responsibilities and ethical considerations
New Auto-Interp
Negative Logits
ancies
-0.73
onyms
-0.67
iries
-0.66
dimension
-0.66
undreds
-0.66
obos
-0.66
verning
-0.65
irie
-0.64
roofs
-0.64
racuse
-0.63
POSITIVE LOGITS
considering
1.15
indeed
1.11
nonetheless
1.06
compared
0.94
behold
0.80
imaginable
0.77
stros
0.73
nevertheless
0.70
akin
0.70
worth
0.68
Activations Density 0.248%