INDEX
Explanations
phrases indicating urgency or frequency
New Auto-Interp
Negative Logits
æľĹ
-0.18
hardt
-0.17
cap
-0.16
Lair
-0.16
fell
-0.16
Mate
-0.15
rejo
-0.15
761
-0.14
antee
-0.14
Borg
-0.14
POSITIVE LOGITS
ãĥ¼ãĥ³
0.19
uen
0.17
pond
0.15
pressed
0.14
GOODMAN
0.14
jal
0.14
RESSED
0.14
arend
0.14
irth
0.14
IDA
0.14
Activations Density 0.004%