INDEX
Explanations
dates specified in the format "DAY MONTH YEAR"
punctuation and conjunctions in text
New Auto-Interp
Negative Logits
thood
-0.77
olo
-0.69
izen
-0.67
aunted
-0.66
forcement
-0.66
ulatory
-0.65
obos
-0.62
iety
-0.62
antz
-0.61
undai
-0.61
POSITIVE LOGITS
meaning
1.08
huh
1.01
whereas
1.00
hence
0.94
but
0.88
yeah
0.83
yes
0.81
haha
0.80
yet
0.80
Meaning
0.79
Activations Density 0.443%