INDEX
Explanations
phrases indicating the experience of never doing something
New Auto-Interp
Negative Logits
always
-0.65
sempre
-0.58
toujours
-0.58
-0.57
حياته
-0.56
constantly
-0.56
πάντα
-0.54
siempre
-0.53
wciąż
-0.53
continually
-0.52
POSITIVE LOGITS
theless
1.06
again
0.81
more
0.81
Again
0.75
ceases
0.74
ending
0.74
again
0.72
Again
0.67
mind
0.67
AGAIN
0.65
Activations Density 0.125%