INDEX
Explanations
references to specific entities or events within a text
punctuation and special characters, particularly parentheses and brackets
New Auto-Interp
Negative Logits
princ
-0.80
¥ŀ
-0.76
imperson
-0.75
exha
-0.74
undermin
-0.72
administ
-0.68
ĪĴ
-0.67
tremend
-0.66
isot
-0.65
manent
-0.65
POSITIVE LOGITS
<|endoftext|>
1.53
↵
1.43
↵↵
1.39
[/
1.08
↵Âł
0.96
********************************
0.92
âĶľâĶĢâĶĢ
0.90
||||
0.90
pic
0.90
Originally
0.88
Activations Density 0.129%