INDEX
Explanations
references to achieving fame or recognition
references to fame and its different aspects
New Auto-Interp
Negative Logits
OTS
-0.63
groceries
-0.62
atives
-0.61
ajor
-0.60
Agg
-0.59
halves
-0.58
ework
-0.58
Genocide
-0.57
tense
-0.56
lements
-0.56
POSITIVE LOGITS
æ©
0.85
fame
0.83
stroke
0.80
seeker
0.79
tremend
0.79
rities
0.75
stars
0.75
frey
0.75
":""},{"0.73
certs
0.73
Activations Density 0.025%