INDEX
Explanations
questions ending in question marks
New Auto-Interp
Negative Logits
ursions
-0.88
seizures
-0.79
tours
-0.79
overfl
-0.79
syn
-0.76
deployments
-0.76
enhancements
-0.75
urances
-0.75
inund
-0.75
repairs
-0.74
POSITIVE LOGITS
whoever
0.92
Adolf
0.91
Someone
0.89
Yourself
0.86
Herman
0.86
Whoever
0.83
Donald
0.82
Carly
0.81
Somebody
0.81
Darth
0.80
Activations Density 0.394%