INDEX
Explanations
proper nouns related to famous personalities
New Auto-Interp
Negative Logits
hips
-0.75
ngth
-0.71
hip
-0.71
atively
-0.71
alid
-0.70
awaru
-0.69
aries
-0.68
los
-0.66
rentices
-0.66
alam
-0.65
POSITIVE LOGITS
Springer
1.05
Kramer
0.88
stein
0.78
Fal
0.77
Coy
0.75
Angelo
0.73
ono
0.72
Garcia
0.71
Reese
0.71
ãĤ§
0.71
Activations Density 0.010%