INDEX
Explanations
words related to geography or locations, possibly focusing on specific place names
words that contain specific consonant structures
New Auto-Interp
Negative Logits
latent
-0.71
unexpl
-0.70
Magikarp
-0.69
admiration
-0.67
discrimination
-0.66
discriminate
-0.65
goodwill
-0.65
invention
-0.62
depressive
-0.61
infringing
-0.61
POSITIVE LOGITS
veland
0.83
onica
0.80
heed
0.79
*/(
0.74
hemy
0.73
nces
0.73
voy
0.71
ico
0.70
berus
0.70
vette
0.69
Activations Density 0.090%