INDEX
Explanations
expressions describing a particular manner or approach of doing things
phrases that express a manner or method of doing something
New Auto-Interp
Negative Logits
¥µ
-0.72
rament
-0.72
incinn
-0.71
encer
-0.70
Bei
-0.69
inus
-0.67
avorite
-0.66
¥ŀ
-0.65
ishable
-0.65
anmar
-0.64
POSITIVE LOGITS
abl
0.76
finding
0.75
forward
0.75
ward
0.73
fare
0.70
somew
0.70
resembles
0.68
WARD
0.67
hered
0.67
resembling
0.66
Activations Density 0.031%