INDEX
Explanations
adverbs ending in -ly
adverbs and related modifiers
New Auto-Interp
Negative Logits
Pirate
-0.65
treasure
-0.64
plate
-0.62
ocene
-0.61
Fairy
-0.60
letter
-0.60
fr
-0.59
vest
-0.58
mouth
-0.57
Moonlight
-0.57
POSITIVE LOGITS
terness
0.86
iration
0.73
conclud
0.71
olate
0.71
uthor
0.68
ccording
0.68
speaking
0.67
entimes
0.67
trained
0.67
compr
0.66
Activations Density 0.043%