INDEX
Explanations
phrases related to comparisons or evaluations
phrases that highlight significant topics or concepts within a discussion
New Auto-Interp
Negative Logits
iture
-0.89
anse
-0.77
ourt
-0.75
heit
-0.72
brance
-0.71
aeus
-0.70
CLE
-0.70
UME
-0.69
.............
-0.68
ablishment
-0.66
POSITIVE LOGITS
reasons
1.41
coolest
1.29
biggest
1.24
earliest
1.24
hardest
1.20
easiest
1.20
greatest
1.18
drawbacks
1.18
strang
1.17
simplest
1.13
Activations Density 0.078%