INDEX
Explanations
phrases indicating warnings or cautionary sentiments
New Auto-Interp
Negative Logits
sogar
-0.16
adla
-0.15
imbus
-0.14
zwar
-0.14
ITO
-0.14
really
-0.14
nawet
-0.14
adlo
-0.14
zp
-0.14
.ta
-0.14
POSITIVE LOGITS
nor
0.33
Nor
0.28
Nor
0.26
nor
0.24
anymore
0.21
occasionally
0.19
NOR
0.17
sondern
0.16
Norris
0.16
occasional
0.16
Activations Density 0.137%