INDEX
Explanations
terms related to extreme political ideologies
references to extreme political ideologies
New Auto-Interp
Negative Logits
Scrib
-0.78
attraction
-0.67
Doodle
-0.66
pumps
-0.66
Sugar
-0.66
submission
-0.66
metadata
-0.65
couple
-0.65
Parenthood
-0.64
McH
-0.64
POSITIVE LOGITS
reaching
1.70
fetched
1.69
sighted
1.64
ranging
1.49
fl
1.43
seeing
1.42
left
1.39
right
1.33
too
1.30
away
1.24
Activations Density 0.018%