INDEX
Explanations
phrases referring to specific situations or instances
phrases that indicate common situations or experiences
New Auto-Interp
Negative Logits
aeus
-0.87
shaw
-0.80
culosis
-0.71
arnaev
-0.70
matter
-0.68
kamp
-0.67
endor
-0.67
onis
-0.66
Rated
-0.65
strength
-0.64
POSITIVE LOGITS
occasions
1.06
rare
1.05
situations
1.04
moments
0.91
things
0.91
pesky
0.89
cases
0.88
annoying
0.84
weird
0.80
acron
0.79
Activations Density 0.073%