INDEX
Explanations
references to powerful and influential individuals, such as moguls or gurus
words associated with individuals who have substantial influence in their respective fields
New Auto-Interp
Negative Logits
ood
-0.76
endars
-0.72
alam
-0.71
ī
-0.70
ridges
-0.70
abet
-0.70
ulz
-0.68
elt
-0.68
ria
-0.67
°
-0.67
POSITIVE LOGITS
mogul
1.04
guru
0.84
frontrunner
0.82
dinand
0.75
esses
0.73
coon
0.72
rabbi
0.71
tec
0.70
mog
0.70
pedd
0.69
Activations Density 0.015%