INDEX
Explanations
words related to political or geographical context
New Auto-Interp
Head Attr Weights
0:0.08
1:0.01
2:0.06
3:0.45
4:0.02
5:0.09
6:0.02
7:0.04
8:0.03
9:0.01
10:0.10
11:0.02
Negative Logits
failures
-2.01
inspections
-1.97
Testing
-1.93
pressures
-1.92
resilience
-1.90
efforts
-1.89
storms
-1.89
workers
-1.89
unprepared
-1.86
shoppers
-1.85
POSITIVE LOGITS
noun
3.98
slang
3.72
surname
3.71
adjective
3.70
name
3.67
abbre
3.66
phrase
3.60
suffix
3.60
nickname
3.43
descriptor
3.40
Activations Density 1.284%