INDEX
Explanations
phrases indicating pointed out or highlighted information
phrases indicating observation or awareness
New Auto-Interp
Negative Logits
prus
-0.68
estern
-0.68
izens
-0.61
rig
-0.61
jing
-0.61
ococ
-0.61
rency
-0.59
addons
-0.59
jug
-0.59
rang
-0.58
POSITIVE LOGITS
,.
0.91
(),
0.86
,
0.83
,,
0.80
.,
0.76
wont
0.70
,-
0.69
âĶĢ
0.68
terday
0.67
(),
0.66
Activations Density 0.078%