INDEX
Explanations
specific adjectives and adverbs that convey intensity or emphasis
New Auto-Interp
Negative Logits
abase
-0.15
wine
-0.15
burger
-0.13
çŃĴ
-0.13
_formula
-0.13
_completion
-0.13
-Allow
-0.13
gles
-0.13
abad
-0.13
pill
-0.13
POSITIVE LOGITS
ingo
0.15
anten
0.15
-overlay
0.15
ogo
0.15
ANJI
0.14
oni
0.14
éĸī
0.13
etta
0.13
Ĥæķ°
0.13
ANGO
0.13
Activations Density 0.453%