INDEX
Explanations
punctuation marks and their associated contexts
New Auto-Interp
Negative Logits
”!
-0.76
již
-0.72
maktadır
-0.64
Doch
-0.63
Doch
-0.63
Jednak
-0.62
Somit
-0.62
-0.60
sculptured
-0.60
…………………………………………
-0.59
POSITIVE LOGITS
IIRC
1.11
iirc
1.11
fucking
1.01
fucking
0.99
goddamn
0.99
fucked
0.99
shitty
0.98
FUCKING
0.97
fuck
0.97
fuck
0.92
Activations Density 0.408%