INDEX
Explanations
news articles or publications
instances of commas or punctuation in a text
New Auto-Interp
Negative Logits
Availability
-0.67
autop
-0.63
appro
-0.62
ulz
-0.61
fantasies
-0.61
ocry
-0.60
idols
-0.57
¬¼
-0.57
herent
-0.57
ãĥ¥
-0.55
POSITIVE LOGITS
however
1.25
huh
1.11
meanwhile
1.08
although
1.05
though
0.98
albeit
0.91
namely
0.87
citing
0.81
moreover
0.81
whereas
0.78
Activations Density 1.614%