INDEX
Explanations
references to stars or celebrities
instances of the word "star"
New Auto-Interp
Negative Logits
apons
-1.01
»Ĵ
-1.00
nsic
-0.95
odcast
-0.92
veyard
-0.88
berra
-0.88
ibaba
-0.85
Downloadha
-0.85
ĵĺ
-0.85
iblings
-0.84
POSITIVE LOGITS
star
1.13
stars
1.12
star
0.92
stars
0.92
light
0.85
attraction
0.84
liner
0.83
lit
0.78
burst
0.75
lite
0.75
Activations Density 0.011%