INDEX
Explanations
mentions of the city "Tucson"
references to specific elements related to health conditions and locations
New Auto-Interp
Negative Logits
train
-0.83
phal
-0.75
Cand
-0.75
士
-0.75
phe
-0.74
Versions
-0.71
vous
-0.71
heads
-0.70
fol
-0.67
handler
-0.67
POSITIVE LOGITS
olkien
1.22
ribute
1.02
olerance
1.01
ango
0.97
oday
0.95
akedown
0.95
urtle
0.95
uple
0.93
oxin
0.93
ractor
0.92
Activations Density 0.037%