INDEX
Explanations
phrases related to surprising or shocking information
references to surprise or shock directed towards individuals or groups
New Auto-Interp
Negative Logits
disappearing
-0.62
elevation
-0.61
backups
-0.59
ampl
-0.58
minus
-0.58
mut
-0.55
ord
-0.55
BuyableInstoreAndOnline
-0.55
github
-0.55
OPA
-0.54
POSITIVE LOGITS
selves
0.95
jah
0.94
hematically
0.84
DERR
0.78
antics
0.75
[*
0.74
alike
0.72
sensibilities
0.71
aughs
0.70
oppers
0.70
Activations Density 0.298%