INDEX
Explanations
phrases related to mechanisms or processes in scientific contexts
New Auto-Interp
Negative Logits
iaries
-0.65
hiro
-0.64
minster
-0.62
gets
-0.59
atana
-0.59
uden
-0.59
ield
-0.56
Parables
-0.55
oning
-0.55
hab
-0.54
POSITIVE LOGITS
mechanism
0.81
mechanisms
0.77
whereby
0.76
utics
0.75
Mechan
0.74
witz
0.68
ptoms
0.66
workings
0.65
ality
0.63
symmetry
0.63
Activations Density 11.207%