INDEX
Explanations
words related to followers, listeners, supporters, and students
references to groups of people or audiences associated with a leader or figure
New Auto-Interp
Negative Logits
cer
-0.74
Thing
-0.63
aton
-0.61
Kov
-0.58
Obst
-0.56
otherapy
-0.56
uve
-0.56
wcs
-0.56
aeper
-0.55
pour
-0.55
POSITIVE LOGITS
hip
1.11
selves
1.09
folk
1.05
hips
1.00
counterparts
0.97
ervative
0.95
mates
0.95
mith
0.95
'
0.93
heet
0.93
Activations Density 0.180%