INDEX
Explanations
occurrences of the word "never."
New Auto-Interp
Negative Logits
vidia
-0.81
AOC
-0.76
Lübeck
-0.73
deaux
-0.73
்ச
-0.72
Muffins
-0.72
PIB
-0.71
oforte
-0.71
Otis
-0.71
شة
-0.70
POSITIVE LOGITS
never
2.39
Never
2.21
NEVER
2.18
never
2.17
Never
2.13
NEVER
2.06
Nunca
1.61
Nunca
1.60
nunca
1.55
nunca
1.36
Activations Density 0.031%