INDEX
Explanations
also, alternatively, additionally
New Auto-Interp
Negative Logits
sciutto
0.40
lefthar
0.38
adece
0.36
same
0.36
େ
0.35
albeit
0.35
δου
0.35
exactly
0.35
"+"|".
0.34
viewpoint
0.34
POSITIVE LOGITS
↵↵↵↵↵↵
0.54
↵↵↵↵↵
0.52
↵↵↵↵↵↵↵
0.50
Also
0.49
Außerdem
0.48
↵↵↵↵
0.47
↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵
0.47
außerdem
0.46
↵↵↵
0.46
↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵
0.45
Activations Density 0.013%