INDEX
Explanations
references to the concept of 'gold'
references to "gold."
New Auto-Interp
Negative Logits
zee
-0.82
Explicit
-0.78
Romance
-0.73
Violent
-0.71
========
-0.71
ATIONS
-0.70
Consent
-0.70
Alive
-0.69
Violence
-0.67
Goodbye
-0.66
POSITIVE LOGITS
vertisement
1.17
coins
1.13
smith
1.07
medal
1.04
fish
1.01
coin
0.97
jewelry
0.96
mine
0.91
oxide
0.90
medals
0.89
Activations Density 0.036%