INDEX
Explanations
words related to exaggeration or simplification
instances of the word "overs" or its variations, indicating a focus on the concept of oversimplification
New Auto-Interp
Negative Logits
arc
-0.71
Nanto
-0.68
motions
-0.63
corridor
-0.62
Sho
-0.62
Dot
-0.62
Guard
-0.62
utmost
-0.61
theater
-0.61
tooth
-0.60
POSITIVE LOGITS
overs
1.29
impl
1.14
aturated
0.98
lap
0.95
leep
0.90
amples
0.86
icro
0.84
laughter
0.80
ummer
0.78
olicited
0.76
Activations Density 0.005%