INDEX
Explanations
references or citations within a text
references to citations or sources used in arguments
New Auto-Interp
Negative Logits
ensibly
-0.77
cos
-0.75
Sphere
-0.71
byss
-0.69
mire
-0.69
ateurs
-0.68
icion
-0.67
mid
-0.66
encia
-0.66
quer
-0.65
POSITIVE LOGITS
similarities
0.95
preced
0.95
accomplishments
0.93
debunked
0.90
example
0.87
similarity
0.85
precedent
0.85
examples
0.84
shortcomings
0.83
inaccur
0.83
Activations Density 0.324%