INDEX
Explanations
adverbs indicating a significant degree or extent
intensifiers expressing significant or drastic change
New Auto-Interp
Negative Logits
ifully
-0.89
sburgh
-0.81
afety
-0.80
elsen
-0.75
tein
-0.75
tale
-0.73
Memories
-0.72
awaru
-0.67
gerald
-0.67
plugin
-0.66
POSITIVE LOGITS
impacted
0.99
impacting
0.94
reduce
0.93
reduces
0.93
effected
0.92
reduced
0.90
differ
0.88
altered
0.87
improves
0.86
improve
0.84
Activations Density 0.050%