INDEX
Explanations
instances of the word "same" followed by a positive integer, indicating similarity or repetition
references to similarity or repetition
New Auto-Interp
Negative Logits
*=-
-0.76
ONSORED
-0.72
rend
-0.72
orial
-0.71
Democr
-0.70
Provided
-0.70
Lauder
-0.69
pac
-0.68
olate
-0.68
ç«
-0.67
POSITIVE LOGITS
kinds
0.98
thing
0.97
exact
0.96
amount
0.89
sorts
0.88
fate
0.88
kind
0.84
vein
0.81
ol
0.78
sort
0.77
Activations Density 0.046%