INDEX
Explanations
phrases indicating conclusions or results
New Auto-Interp
Negative Logits
Pring
-0.84
lipop
-0.74
Okey
-0.73
LON
-0.71
Beep
-0.69
Alton
-0.69
'>{-0.67
pee
-0.65
Waiting
-0.65
ateral
-0.65
POSITIVE LOGITS
propOrder
1.13
thus
1.05
ly
1.04
thus
0.93
Hentet
0.90
Thus
0.88
Thus
0.86
autorytatywna
0.85
Hence
0.84
Hence
0.83
Activations Density 0.129%