INDEX
Explanations
words related to medical conditions and treatments, especially surgery and injuries
words related to bodily functions and processes
New Auto-Interp
Negative Logits
shore
-0.71
genders
-0.69
leash
-0.61
sight
-0.60
fact
-0.59
endor
-0.58
reins
-0.57
hens
-0.57
PUT
-0.57
Cherokee
-0.56
POSITIVE LOGITS
berus
1.15
cer
0.99
oute
0.90
ozo
0.89
ulent
0.85
culus
0.84
nel
0.84
pillar
0.83
ate
0.82
anos
0.81
Activations Density 0.031%