INDEX
Explanations
places or entities related to government and authority
references to governmental and institutional entities
New Auto-Interp
Negative Logits
gradient
-0.69
DRAG
-0.68
luster
-0.63
Dover
-0.62
eatures
-0.61
Dur
-0.60
ãĥĥãĥĪ
-0.60
ivities
-0.60
Stain
-0.59
POR
-0.59
POSITIVE LOGITS
who
1.13
who
1.08
whom
1.07
whose
0.87
illegally
0.81
whose
0.78
swear
0.75
flock
0.75
interested
0.74
subscribing
0.70
Activations Density 0.419%