INDEX
Explanations
terms and phrases related to political manipulation and critique
New Auto-Interp
Head Attr Weights
0:0.06
1:0.01
2:0.06
3:0.43
4:0.04
5:0.06
6:0.02
7:0.05
8:0.03
9:0.02
10:0.14
11:0.02
Negative Logits
pairs
-2.73
pads
-2.58
beds
-2.56
rooms
-2.53
english
-2.48
ranges
-2.42
servers
-2.41
monitors
-2.39
streams
-2.38
locations
-2.38
POSITIVE LOGITS
betrayal
3.23
accomplishment
3.07
fallacy
3.02
setback
3.00
folly
2.99
inev
2.90
heresy
2.88
inconvenience
2.87
disgrace
2.87
imposition
2.85
Activations Density 0.725%