INDEX
Explanations
words related to various countries
words related to wealth and affluent individuals
New Auto-Interp
Negative Logits
nonexistent
-0.67
otherwise
-0.67
obs
-0.64
slow
-0.63
INS
-0.59
kan
-0.58
mol
-0.58
Inf
-0.57
orth
-0.57
ramp
-0.56
POSITIVE LOGITS
aire
4.94
aires
3.37
naire
2.35
naires
1.40
ary
1.39
airs
1.33
air
1.32
aries
1.26
arie
1.25
ario
1.24
Activations Density 0.014%