INDEX
Explanations
words related to entertainment or humor
New Auto-Interp
Negative Logits
ODO
-0.07
ricks
-0.07
Dude
-0.07
iks
-0.06
odore
-0.06
odo
-0.06
indi
-0.06
kyt
-0.06
à¸ģล
-0.06
ikip
-0.06
POSITIVE LOGITS
THIS
0.09
this
0.09
this
0.08
thì
0.08
hãy
0.08
THIS
0.08
(this
0.07
questo
0.07
this
0.07
,this
0.07
Activations Density 0.032%