INDEX
Explanations
comparisons between quantities, characteristics, or actions
comparative statements regarding various topics, focusing on contrasting entities or conditions
New Auto-Interp
Negative Logits
awa
-0.63
Topics
-0.59
afety
-0.56
details
-0.56
Topic
-0.55
escription
-0.54
later
-0.53
kindred
-0.51
={-0.51
uish
-0.50
POSITIVE LOGITS
combined
1.16
did
1.01
does
0.94
ever
0.94
realizes
0.84
EVER
0.84
suggests
0.81
did
0.80
nor
0.80
counterparts
0.80
Activations Density 0.320%