INDEX
Explanations
names of persons or places
proper nouns, specifically names of people and places
New Auto-Interp
Negative Logits
heses
-0.71
mble
-0.69
agne
-0.65
idy
-0.64
hammad
-0.62
hesis
-0.62
rieg
-0.61
Directors
-0.59
hetical
-0.59
wig
-0.58
POSITIVE LOGITS
lot
0.86
dro
0.78
itous
0.77
TAIN
0.77
dropping
0.76
tain
0.76
ourn
0.73
ols
0.71
Oo
0.71
hig
0.70
Activations Density 0.137%