INDEX
Explanations
phrases indicating substitution or replacement
New Auto-Interp
Negative Logits
ells
-0.77
Dynamics
-0.66
Skies
-0.65
Rebell
-0.64
Drift
-0.60
knit
-0.60
damned
-0.60
Beans
-0.59
istor
-0.58
izoph
-0.58
POSITIVE LOGITS
thereof
1.11
itial
0.76
ngth
0.75
uations
0.75
INGTON
0.73
lieu
0.72
uary
0.71
aldehyde
0.70
guiActiveUn
0.70
isons
0.69
Activations Density 0.005%