INDEX
Explanations
state names
references to governmental or political entities
New Auto-Interp
Negative Logits
swe
-0.72
mant
-0.69
pudding
-0.66
figur
-0.65
mould
-0.65
newcomers
-0.64
heels
-0.64
arom
-0.64
Argent
-0.64
narciss
-0.63
POSITIVE LOGITS
State
3.91
state
2.68
STATE
2.50
States
2.46
State
2.44
STATE
2.40
state
2.05
states
1.91
States
1.46
Gov
1.44
Activations Density 0.010%