INDEX
Explanations
comparative phrases indicating a significant increase or degree
New Auto-Interp
Negative Logits
ãĥĨãĥ«
-0.17
sworth
-0.17
omer
-0.16
dep
-0.15
rais
-0.15
iness
-0.14
ebi
-0.14
asaki
-0.14
_PRI
-0.14
spot
-0.14
POSITIVE LOGITS
-grand
0.21
åı·
0.19
ölçüde
0.18
sword
0.17
spender
0.16
odus
0.16
687
0.16
dane
0.15
ened
0.15
atsby
0.15
Activations Density 0.030%