INDEX
Explanations
mentions of the word "My" followed by a proper noun
references to specific media titles or franchises
New Auto-Interp
Negative Logits
bott
-0.79
ooz
-0.73
yourselves
-0.69
forcefully
-0.68
inelli
-0.62
wisely
-0.62
igham
-0.62
characterized
-0.62
ieri
-0.62
ramps
-0.62
POSITIVE LOGITS
stery
1.63
riad
1.53
anmar
1.40
stic
1.33
ths
1.26
chal
1.12
ocard
1.12
ster
1.10
croft
1.08
Space
1.02
Activations Density 0.048%