INDEX
Explanations
instances where something is being recommended or proposed
New Auto-Interp
Negative Logits
bers
-0.86
brance
-0.83
erning
-0.81
vation
-0.77
ership
-0.77
alde
-0.77
tin
-0.77
aos
-0.76
ctors
-0.74
reth
-0.74
POSITIVE LOGITS
suggestions
0.84
Parenthood
0.76
Attempts
0.72
hints
0.72
suggested
0.72
introdu
0.71
unanimously
0.70
easing
0.69
hint
0.68
Magikarp
0.68
Activations Density 0.035%