INDEX
Explanations
high activation values associated with specific medical or scientific terminology
New Auto-Interp
Negative Logits
$_[
-0.71
DeV
-0.71
Vill
-0.69
운
-0.67
tfrac
-0.67
Goy
-0.67
er
-0.66
Gerr
-0.66
tam
-0.65
Osh
-0.64
POSITIVE LOGITS
})*/
1.30
}))
1.27
]")]
1.23
]})
1.23
})()
1.22
}))
1.19
']")
1.12
})));
1.11
)})
1.08
}])
1.08
Activations Density 0.166%