INDEX
Explanations
job titles or professional roles
positions or titles related to leadership and organizational roles
New Auto-Interp
Negative Logits
notions
-0.65
assumptions
-0.63
conclusions
-0.61
strangers
-0.61
mug
-0.61
whim
-0.60
humidity
-0.60
fools
-0.59
myths
-0.58
ushes
-0.58
POSITIVE LOGITS
overseeing
1.03
alongside
0.82
orney
0.76
Assist
0.74
ature
0.69
esses
0.68
piece
0.67
respectively
0.67
until
0.67
Leilan
0.66
Activations Density 0.279%