INDEX
Explanations
phrases indicating the necessity or importance of actions or conditions
New Auto-Interp
Negative Logits
gjenge
-0.47
CURIAM
-0.46
Begründung
-0.43
LikeLiked
-0.40
certeza
-0.39
remarquer
-0.38
Ähn
-0.38
lutar
-0.38
jelas
-0.38
merak
-0.38
POSITIVE LOGITS
wise
1.34
best
1.10
wisest
1.08
wise
0.97
Wise
0.96
prudent
0.96
Wise
0.94
WISE
0.92
wiser
0.90
advisable
0.89
Activations Density 0.268%