INDEX
Explanations
phrases related to personal attributes or traits
connective phrases and discourse markers indicating relationships and conditions in sentences
New Auto-Interp
Negative Logits
spur
-0.74
ussions
-0.71
ullivan
-0.70
ioxide
-0.69
ISION
-0.68
USS
-0.67
ãĥ¯ãĥ³
-0.66
izens
-0.65
isions
-0.65
rils
-0.63
POSITIVE LOGITS
albeit
0.77
albeit
0.76
discipl
0.73
specializing
0.72
persecuted
0.70
although
0.69
Sly
0.68
liar
0.68
pretending
0.67
educator
0.66
Activations Density 0.491%