INDEX
Explanations
mentions of professional relationships with peers
references to colleagues in a professional or work-related context
New Auto-Interp
Negative Logits
Control
-0.58
archs
-0.58
vable
-0.58
streets
-0.57
hills
-0.57
dark
-0.56
islands
-0.56
oo
-0.56
ãĥİ
-0.56
acion
-0.56
POSITIVE LOGITS
colleague
3.83
colleagues
2.59
classmate
2.07
comrade
1.99
teammate
1.97
cowork
1.97
collaborator
1.90
coworkers
1.70
friend
1.66
partner
1.52
Activations Density 0.016%