INDEX
Explanations
phrases related to drugs or medicine
mentions of rugs
New Auto-Interp
Negative Logits
Highlands
-0.64
Reconstruction
-0.62
Eclipse
-0.62
Dresden
-0.62
Korean
-0.61
oral
-0.59
Petra
-0.59
Grail
-0.59
cav
-0.59
Peninsula
-0.59
POSITIVE LOGITS
rug
1.48
uay
0.90
ular
0.87
ulent
0.86
ulence
0.85
iosity
0.85
icz
0.85
iety
0.85
atism
0.83
osity
0.82
Activations Density 0.005%