INDEX
Explanations
terms related to scientific concepts and medical research
preceding specific nouns
concepts and specific terms
New Auto-Interp
Negative Logits
-0.59
L
-0.52
haped
-0.51
R
-0.51
&
-0.50
'
-0.50
O
-0.49
pecific
-0.47
</strong>
-0.47
Le
-0.45
POSITIVE LOGITS
تقاوى
0.91
featureID
0.91
houſe
0.88
Jefus
0.86
myſelf
0.84
againſt
0.83
ſche
0.81
NDEBUG
0.80
pinulongan
0.79
Reſ
0.79
Activations Density 1.694%