INDEX
Explanations
sports, study, cats, diversity
New Auto-Interp
Negative Logits
kiddos
0.77
badass
0.73
underwhelming
0.70
moniker
0.68
inherently
0.65
workaround
0.65
Leveraging
0.63
leveraging
0.62
shenanigans
0.62
bolstering
0.62
POSITIVE LOGITS
newspapers
0.64
businessmen
0.64
spapers
0.59
policemen
0.58
famous
0.57
clothes
0.55
clothes
0.55
Businessman
0.54
berühm
0.52
spoilt
0.52
Activations Density 0.020%