INDEX
Explanations
adjectives related to exaggeration or comparison
New Auto-Interp
Negative Logits
ashtra
-0.80
afety
-0.78
iard
-0.69
ascript
-0.68
utical
-0.66
tery
-0.66
anmar
-0.65
Ĥİ
-0.65
gans
-0.65
essor
-0.64
POSITIVE LOGITS
Savior
0.75
Irish
0.74
Gleaming
0.71
Sax
0.71
ellation
0.70
Eye
0.69
icket
0.69
Strip
0.68
Letter
0.68
Led
0.67
Activations Density 0.065%