INDEX
Explanations
phrases indicating a summary or conclusion
phrases indicating collective experiences or shared situations
New Auto-Interp
Negative Logits
ror
-0.72
anymore
-0.71
whisper
-0.61
lev
-0.61
gey
-0.60
opp
-0.59
ça
-0.58
panic
-0.58
":""},{"-0.58
drm
-0.57
POSITIVE LOGITS
totals
0.93
Ĥª
0.83
totaling
0.82
ogether
0.81
total
0.78
comprise
0.77
amassed
0.75
suffice
0.74
tidy
0.73
roughly
0.72
Activations Density 0.571%