INDEX
Explanations
questions and phrases that indicate curiosity or inquiry
New Auto-Interp
Negative Logits
estyles
-0.89
ursions
-0.83
overfl
-0.80
deployments
-0.76
refin
-0.72
withdrawals
-0.72
ulations
-0.71
offsets
-0.71
registrations
-0.71
tours
-0.70
POSITIVE LOGITS
whoever
1.07
Yourself
0.96
himself
0.89
Someone
0.86
Adolf
0.83
myself
0.82
Himself
0.82
omever
0.81
somebody
0.80
someone
0.80
Activations Density 0.206%