INDEX
Explanations
negations or the word "not."
New Auto-Interp
Negative Logits
insuffisamment
-0.56
cannot
-0.51
Organisateur
-0.49
tidak
-0.45
concluded
-0.44
even
-0.43
ikke
-0.43
không
-0.43
occasionally
-0.42
non
-0.41
POSITIVE LOGITS
buy
0.57
Screen
0.56
not
0.52
Moon
0.51
:✨
0.50
crops
0.50
moon
0.49
Theme
0.49
proposition
0.49
noDo
0.49
Activations Density 0.475%