INDEX
Explanations
regions, states, and countries
specific geographic locations and demographic information
New Auto-Interp
Negative Logits
meanings
-0.73
consensual
-0.67
minist
-0.60
privileges
-0.60
agonist
-0.57
models
-0.57
iple
-0.56
ivities
-0.55
ways
-0.55
products
-0.55
POSITIVE LOGITS
Cong
0.66
ague
0.64
itto
0.62
avia
0.61
boa
0.60
SEA
0.60
ataka
0.60
Chart
0.60
fared
0.59
okia
0.59
Activations Density 0.255%