INDEX
Explanations
specific structural elements and formatting in text
New Auto-Interp
Negative Logits
تقاوى
-0.64
cherchés
-0.56
haikusbot
-0.55
senhora
-0.54
SharedDtor
-0.53
betweenstory
-0.51
téléphonique
-0.50
Infórmanos
-0.50
ArgsConstructor
-0.49
⟬
-0.49
POSITIVE LOGITS
↵↵↵
0.81
↵↵↵↵↵
0.69
↵↵↵↵
0.67
↵↵↵↵↵↵↵
0.65
↵↵↵↵↵↵↵↵↵
0.60
↵↵↵↵↵↵
0.60
↵↵↵↵↵↵↵↵
0.59
↵↵↵↵↵↵↵↵↵↵↵
0.58
Gros
0.53
uz
0.52
Activations Density 0.011%