INDEX
Explanations
proper nouns, particularly names of individuals and titles
New Auto-Interp
Negative Logits
supers
-0.67
CLS
-0.67
theless
-0.66
Sakuya
-0.65
Dire
-0.64
Coastal
-0.62
GST
-0.62
horizont
-0.61
ACC
-0.61
USPS
-0.61
POSITIVE LOGITS
acci
0.95
aney
0.93
rower
0.93
beck
0.93
kowski
0.90
isner
0.90
riott
0.89
rage
0.89
ansky
0.88
zinski
0.88
Activations Density 0.077%