INDEX
Explanations
names of individuals or groups of people
proper nouns, particularly names and places
New Auto-Interp
Negative Logits
endowed
-0.71
sterling
-0.65
Clover
-0.62
Calder
-0.61
NCT
-0.57
BILITY
-0.56
Pell
-0.56
enegger
-0.55
Frozen
-0.55
Kessler
-0.55
POSITIVE LOGITS
atoon
0.87
oslav
0.85
adic
0.82
pta
0.82
ivas
0.76
inx
0.75
oku
0.75
udos
0.75
atchewan
0.74
iman
0.74
Activations Density 0.093%