INDEX
Explanations
mentions of friends, family, allies, and supporters
phrases that emphasize relationships with friends and family
New Auto-Interp
Negative Logits
gears
-0.71
oliberal
-0.68
ERO
-0.66
ument
-0.66
ulo
-0.64
Phase
-0.64
pollut
-0.63
Ball
-0.63
ãĥ´ãĤ¡
-0.63
Pwr
-0.62
POSITIVE LOGITS
neighbours
0.87
neighbors
0.85
coworkers
0.84
strangers
0.84
comrades
0.81
acquaintances
0.80
relatives
0.78
whom
0.77
classmates
0.76
fellow
0.76
Activations Density 0.310%