INDEX
Explanations
mentions of the flu, reactors, and science and medicine
New Auto-Interp
Negative Logits
refer
-0.63
real
-0.61
(“
-0.61
von
-0.59
’,
-0.56
rent
-0.54
(‘
-0.53
med
-0.53
in
-0.52
sin
-0.51
POSITIVE LOGITS
trasparente
1.10
:✨
1.09
vermelha
1.02
eletrônico
1.02
polvere
1.02
dourada
0.99
isolado
0.97
Cæsar
0.96
romántica
0.95
anún
0.94
Activations Density 2.664%