INDEX
Explanations
pronouns referring to people or things
pronouns referring to people
New Auto-Interp
Negative Logits
ielding
-0.78
east
-0.68
semb
-0.68
shown
-0.66
Pwr
-0.66
cond
-0.65
ricanes
-0.65
Rex
-0.63
Fla
-0.63
athon
-0.62
POSITIVE LOGITS
Majesty
1.00
illac
0.81
majesty
0.79
fucking
0.78
mos
0.77
smokes
0.77
fuckin
0.76
fucked
0.72
behav
0.71
Sly
0.69
Activations Density 0.487%