INDEX
Explanations
mentions or references to specific terms or topics within a longer text
references to previous statements or mentions in a discussion
New Auto-Interp
Negative Logits
¯¯¯¯¯¯¯¯
-0.93
¯¯
-0.76
¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
-0.73
sis
-0.72
heric
-0.72
sett
-0.71
ccording
-0.70
¯¯¯¯
-0.70
zers
-0.70
ï¸
-0.69
POSITIVE LOGITS
mentioning
0.94
mentions
0.91
lihood
0.82
Kislyak
0.81
mention
0.76
aloud
0.74
enance
0.73
prominently
0.66
cliffe
0.64
Vaughn
0.63
Activations Density 0.049%