INDEX
Explanations
words related to importance, significance, or consequence.
New Auto-Interp
Negative Logits
<bos>
-1.69
↵↵
-0.57
nemlig
-0.53
themſelves
-0.48
χρήση
-0.48
kullanılır
-0.47
accanto
-0.47
frumos
-0.46
kuitenkin
-0.46
natale
-0.45
POSITIVE LOGITS
AddTagHelper
0.94
مشين
0.84
tvguidetime
0.70
óc
0.69
findpost
0.69
AnchorStyles
0.68
).__
0.66
__*/
0.65
Sharper
0.63
囗
0.62
Activations Density 1.041%