INDEX
Explanations
phrases discussing societal issues and their implications
New Auto-Interp
Negative Logits
Administrativna
-0.39
herzog
-0.37
の方が
-0.35
のほうが
-0.35
ритори
-0.35
whether
-0.35
ようになった
-0.34
skinned
-0.34
同様に
-0.34
דיה
-0.34
POSITIVE LOGITS
already
0.97
otherwise
0.91
already
0.89
ohnehin
0.86
otherwise
0.82
Already
0.80
Already
0.79
altrimenti
0.77
ALREADY
0.76
chances
0.72
Activations Density 0.590%