INDEX
Explanations
phrases expressing the difficulty or ease of a task or process
New Auto-Interp
Negative Logits
frank
-0.17
Drug
-0.14
Drug
-0.14
عد
-0.14
iscrim
-0.14
thren
-0.13
justification
-0.13
bell
-0.13
Datum
-0.13
Bell
-0.13
POSITIVE LOGITS
easier
0.52
asier
0.36
easiest
0.32
easy
0.31
easy
0.29
Easy
0.28
Easy
0.27
eas
0.27
fácil
0.27
æĺĵ
0.26
Activations Density 0.062%