INDEX
Explanations
the beginning of segments in text, indicated by the token '<bos>'
New Auto-Interp
Negative Logits
Vidite
-1.27
Италијани
-1.02
Wikimedijinoj
-0.89
featureID
-0.87
tartalomajánló
-0.83
himo
-0.83
RTEE
-0.83
ImageContext
-0.83
Geplaatst
-0.81
Portale
-0.80
POSITIVE LOGITS
2
0.65
1
0.56
3
0.51
</h2>
0.51
0
0.49
4
0.47
5
0.47
7
0.45
comfort
0.45
8
0.44
Activations Density 0.000%