INDEX
Explanations
words related to medication or medical treatments
words related to comedy or humorous content
New Auto-Interp
Negative Logits
atchewan
-0.79
accompan
-0.70
İĭ
-0.69
ific
-0.69
animous
-0.69
falls
-0.67
ippi
-0.67
hematically
-0.67
ership
-0.66
arers
-0.66
POSITIVE LOGITS
edy
0.95
LINE
0.80
deen
0.73
ge
0.73
geoning
0.70
lig
0.70
dy
0.68
der
0.66
zzy
0.65
isal
0.64
Activations Density 0.014%