INDEX
Explanations
phrases indicating the beginning or introduction of a topic or article
phrases indicating the speaker's perspective or personal involvement
New Auto-Interp
Negative Logits
minds
-0.61
lifestyles
-0.61
alore
-0.61
Lives
-0.60
utic
-0.59
cedes
-0.58
Berry
-0.57
Alert
-0.57
Wise
-0.56
harms
-0.56
POSITIVE LOGITS
'm
1.05
apologise
1.01
apologize
0.98
summarize
0.98
briefly
0.89
propose
0.89
introduce
0.84
reiterate
0.84
reprint
0.81
reproduce
0.80
Activations Density 0.330%