INDEX
Explanations
terms related to health care, medical treatments, and government healthcare policies
New Auto-Interp
Negative Logits
luaj
-0.78
ngth
-0.77
opez
-0.70
ividual
-0.66
isively
-0.62
raper
-0.62
aneers
-0.61
asers
-0.59
ifferent
-0.59
elve
-0.58
POSITIVE LOGITS
happening
0.94
true
0.81
untrue
0.80
why
0.77
blasphemy
0.75
unacceptable
0.75
understandable
0.75
SPONSORED
0.71
heresy
0.71
nonsense
0.70
Activations Density 2.084%