INDEX
Explanations
sentences that express strong positive sentiments or admiration towards subjects
New Auto-Interp
Negative Logits
/Dk
-0.16
heim
-0.14
weren
-0.14
aren
-0.14
abant
-0.14
icum
-0.14
év
-0.13
ubbo
-0.13
fucked
-0.13
akeup
-0.13
POSITIVE LOGITS
ROCK
0.23
truly
0.22
rocks
0.22
Rocks
0.21
sure
0.21
totally
0.20
rivals
0.20
rival
0.19
rules
0.18
Rock
0.18
Activations Density 0.139%