INDEX
Explanations
verbs in the passive voice
phrases that indicate expectations or norms
New Auto-Interp
Negative Logits
tex
-0.77
roads
-0.75
sv
-0.72
lite
-0.70
lake
-0.69
mens
-0.68
collar
-0.68
lust
-0.67
pu
-0.67
east
-0.65
POSITIVE LOGITS
disclaim
0.80
escription
0.73
explan
0.71
inconsist
0.69
entious
0.67
conflic
0.67
DonaldTrump
0.67
ELF
0.67
pport
0.65
¾
0.64
Activations Density 0.027%