INDEX
Explanations
phrases or words that suggest quotes or attributions in text
punctuation marks or special characters
New Auto-Interp
Negative Logits
recogn
-0.88
conservancy
-0.82
userc
-0.76
pressing
-0.76
agate
-0.75
manip
-0.72
cogn
-0.71
ascending
-0.70
descending
-0.69
recognise
-0.68
POSITIVE LOGITS
Ibid
0.96
————
0.80
ONSORED
0.80
————————
0.78
rik
0.77
Sah
0.75
said
0.74
Meh
0.70
hide
0.70
Wilson
0.70
Activations Density 0.048%