INDEX
Explanations
proper nouns
proper nouns, particularly names and places
New Auto-Interp
Negative Logits
ãĤ·ãĥ£
-0.73
Beck
-0.72
hardt
-0.67
adr
-0.67
IGH
-0.67
IFE
-0.63
arist
-0.62
omes
-0.62
sorts
-0.61
UFF
-0.61
POSITIVE LOGITS
etsk
0.95
Pok
0.93
asus
0.93
oshenko
0.88
ongyang
0.86
atern
0.83
etry
0.82
ilon
0.77
gran
0.77
etary
0.77
Activations Density 0.042%