INDEX
Explanations
phrases indicating cumulative or final conclusions
phrases indicating the aggregation or summation of elements
New Auto-Interp
Negative Logits
rouse
-0.73
ially
-0.68
cknowled
-0.63
ial
-0.61
Mansion
-0.61
jah
-0.61
orney
-0.61
illery
-0.61
asus
-0.60
Focus
-0.59
POSITIVE LOGITS
ļéĨĴ
0.78
akespe
0.76
¥
0.71
enario
0.69
uyomi
0.69
nces
0.68
dysph
0.66
river
0.65
ruciating
0.64
nerv
0.64
Activations Density 0.038%