INDEX
Explanations
health-related content, particularly focusing on medical studies and events related to physical harm
New Auto-Interp
Negative Logits
RELE
-0.75
Uni
-0.68
infographic
-0.68
wholesale
-0.67
ãĥĩãĤ£
-0.66
liv
-0.66
Creat
-0.65
apr
-0.62
ãĤ´ãĥ³
-0.61
Unle
-0.60
POSITIVE LOGITS
anyahu
0.81
rette
0.79
orf
0.79
rient
0.77
ork
0.75
ady
0.75
intel
0.74
jab
0.74
vals
0.73
ipal
0.73
Activations Density 0.282%