INDEX
Explanations
conversational language and expressions of opinion
New Auto-Interp
Negative Logits
.googlecode
-0.18
geek
-0.17
Zombies
-0.16
Prostit
-0.16
hookers
-0.16
OUCH
-0.15
Awesome
-0.15
ÏĢα
-0.15
Ãĥ
-0.15
xrange
-0.15
POSITIVE LOGITS
stan
0.28
202
0.24
rn
0.22
tb
0.22
vibes
0.21
literal
0.20
covid
0.20
Literal
0.20
Tik
0.20
MCU
0.19
Activations Density 0.933%