INDEX
Explanations
references to the character or individual initialed "G."
New Auto-Interp
Negative Logits
cients
-0.61
GP
-0.60
sincere
-0.59
sto
-0.58
Gym
-0.57
train
-0.57
subscribers
-0.56
Fifth
-0.56
stars
-0.56
Git
-0.56
POSITIVE LOGITS
rieve
1.04
ARY
0.99
entry
0.95
arten
0.94
iannopoulos
0.94
ERAL
0.93
wyn
0.90
OULD
0.88
rosso
0.87
iry
0.86
Activations Density 0.030%