INDEX
Explanations
comparative phrases or contrasting ideas
New Auto-Interp
Negative Logits
매우
-0.53
Dernière
-0.53
غه
-0.52
eneste
-0.52
//
-0.50
einzig
-0.50
consulté
-0.49
Ế
-0.49
ardes
-0.48
&(
-0.47
POSITIVE LOGITS
safer
1.10
stronger
1.03
richer
1.02
healthier
1.00
happier
0.99
clearer
0.98
slower
0.98
higher
0.96
harder
0.95
stiffer
0.95
Activations Density 0.727%