INDEX
Explanations
intensifiers or modifiers that emphasize degree or extent
New Auto-Interp
Negative Logits
ale
-0.17
ning
-0.14
manship
-0.14
rob
-0.14
hammer
-0.14
dens
-0.14
ilig
-0.14
/board
-0.13
ses
-0.13
rai
-0.13
POSITIVE LOGITS
/full
0.18
entirely
0.17
ajan
0.17
aker
0.15
completely
0.15
jadi
0.15
GINE
0.14
ifax
0.14
yscale
0.14
หมà¸Ķ
0.14
Activations Density 0.047%