INDEX
Explanations
no specific features or activations, indicating a lack of meaningful content in the examined text
Code or technical syntax
New Auto-Interp
Negative Logits
purpoſe
-1.59
pleaſure
-1.57
Anſ
-1.53
Efq
-1.48
houſe
-1.46
raiſ
-1.46
Houſe
-1.44
Majefty
-1.42
Jefus
-1.41
Diſ
-1.41
POSITIVE LOGITS
<eos>
0.64
↵↵
0.56
-
0.54
«
0.54
↵
0.53
«
0.52
»
0.46
saga
0.44
».
0.42
et
0.40
Activations Density 0.108%