INDEX
Explanations
terms associated with conspiracy theories and references to power dynamics
New Auto-Interp
Negative Logits
&___
-0.78
quartile
-0.70
endpush
-0.69
'\\;'
-0.68
poffe
-0.68
opedic
-0.68
enterOuterAlt
-0.67
ільки
-0.65
GOLF
-0.65
ſch
-0.63
POSITIVE LOGITS
healing
0.57
Portail
0.52
Healing
0.50
Hol
0.49
~
0.49
yim
0.47
).]
0.47
Leitung
0.46
Gaia
0.46
healing
0.45
Activations Density 0.678%