INDEX
Explanations
the word "followers" along with phrases related to religious, political, or social groups
terms related to followers of various groups or ideologies
New Auto-Interp
Negative Logits
ces
-0.76
Genocide
-0.73
posing
-0.66
ced
-0.65
circumstance
-0.63
geon
-0.63
ospital
-0.63
Rim
-0.63
OUT
-0.63
Kob
-0.62
POSITIVE LOGITS
hip
1.29
followers
1.07
hips
0.96
follower
0.94
ollower
0.85
lia
0.84
wagon
0.82
wagon
0.75
adherent
0.73
ieve
0.71
Activations Density 0.011%