INDEX
Explanations
references to well-known individuals, particularly focusing on their achievements or characteristics
New Auto-Interp
Negative Logits
oner
-0.15
toa
-0.15
ula
-0.15
ains
-0.15
WithEmail
-0.15
uard
-0.14
Bitte
-0.14
éī
-0.14
ilan
-0.14
gin
-0.14
POSITIVE LOGITS
best
0.21
throughout
0.21
fond
0.19
amongst
0.19
known
0.18
-best
0.18
among
0.17
quantity
0.17
/generated
0.17
far
0.17
Activations Density 0.038%