INDEX
Explanations
words related to improvement and progress
instances of the word "improvement."
New Auto-Interp
Negative Logits
printed
-0.65
NA
-0.63
da
-0.63
Vs
-0.62
individual
-0.61
Loaded
-0.61
Vengeance
-0.61
zie
-0.60
cient
-0.60
ordon
-0.59
POSITIVE LOGITS
improvement
1.27
Improvement
0.98
enhancement
0.97
jriwal
0.96
undermin
0.95
improvements
0.91
oldemort
0.91
eatures
0.89
deterioration
0.85
guiActiveUn
0.83
Activations Density 0.011%