INDEX
Explanations
proper nouns, specifically names of political figures
names of notable individuals, particularly those with the initial 'K'
New Auto-Interp
Negative Logits
ashtra
-1.00
Ö¼
-0.91
REL
-0.78
»Ĵ
-0.75
Pwr
-0.70
BLIC
-0.69
INAL
-0.66
allic
-0.66
Vend
-0.65
IUM
-0.65
POSITIVE LOGITS
ĸļ
0.81
isner
0.70
iffin
0.69
izabeth
0.63
unic
0.62
Oswald
0.62
flo
0.62
redo
0.61
hardt
0.60
eper
0.59
Activations Density 0.317%