INDEX
Explanations
instances of the word "can" and its variations, indicating potential or ability
New Auto-Interp
Negative Logits
ufe
-0.15
lette
-0.15
�s
-0.14
�
-0.14
jos
-0.14
elay
-0.14
Olsen
-0.13
γι
-0.13
arti
-0.13
rost
-0.13
POSITIVE LOGITS
't
0.56
’t
0.52
neither
0.43
ä¸įäºĨ
0.35
never
0.32
not
0.31
cannot
0.31
ä¸įèĥ½
0.31
ikke
0.30
nicht
0.29
Activations Density 0.135%