INDEX
Explanations
questions and affirmative statements
New Auto-Interp
Negative Logits
↵
-0.42
↵
-0.25
Âł↵
-0.21
à¥ĩ↵
-0.19
ี↵
-0.18
ा↵
-0.17
<|end_of_text|>
-0.17
à¥Ģ↵
-0.17
↵↵
-0.17
ีà¹ī↵
-0.17
POSITIVE LOGITS
odore
0.33
Their
0.26
adays
0.24
Your
0.22
atre
0.22
You
0.21
You
0.21
Yourself
0.20
Your
0.20
These
0.19
Activations Density 0.564%