INDEX
Negative Logits
olstadt
-0.52
следова
-0.50
sc
-0.50
en
-0.50
zab
-0.49
zb
-0.49
HAPP
-0.48
happy
-0.48
teng
-0.48
random
-0.47
POSITIVE LOGITS
joke
1.13
joke
1.12
Joke
1.11
joking
1.08
jokes
1.02
Jokes
0.98
jokes
0.98
Joke
0.98
Jokes
0.90
joked
0.85
Activations Density 0.008%