INDEX
Explanations
phrases indicating comparison or contrast
references to contextual relationships and additional information in the text
New Auto-Interp
Negative Logits
osponsors
-0.83
english
-0.76
ãĤ¨ãĥ«
-0.71
icons
-0.70
hops
-0.70
amia
-0.69
uay
-0.69
emale
-0.68
cycles
-0.68
conservancy
-0.66
POSITIVE LOGITS
,
0.93
,.
0.76
we
0.75
statement
0.72
scenario
0.72
caveat
0.71
affirmation
0.70
realization
0.69
logic
0.68
limitation
0.68
Activations Density 0.093%