INDEX
Explanations
sentences related to ideology and beliefs
political language surrounding issues of manipulation and power dynamics
New Auto-Interp
Negative Logits
DragonMagazine
-0.69
iple
-0.65
Originally
-0.61
Deadline
-0.60
raft
-0.59
©¶æ¥µ
-0.58
Warehouse
-0.57
ortium
-0.56
availability
-0.56
Medline
-0.55
POSITIVE LOGITS
themselves
0.80
subord
0.77
ignor
0.77
immoral
0.73
unworthy
0.70
inconvenient
0.70
legitim
0.70
bigotry
0.70
ignorant
0.69
undue
0.69
Activations Density 1.119%