INDEX
Explanations
conspiracy-related terms and phrases
New Auto-Interp
Negative Logits
ijk
-0.82
Thom
-0.77
inished
-0.74
older
-0.69
Chop
-0.69
oba
-0.68
esa
-0.66
TD
-0.65
zl
-0.65
isha
-0.64
POSITIVE LOGITS
theorist
1.47
theorists
1.45
theories
1.34
theory
1.06
theor
1.02
ulent
0.96
conspir
0.93
conspiracy
0.93
eering
0.92
hatched
0.88
Activations Density 0.021%