INDEX
Explanations
punctuation and formatting elements within the text
New Auto-Interp
Negative Logits
již
-0.67
Doch
-0.66
omiast
-0.65
Jednak
-0.64
lecz
-0.60
,’’
-0.59
deoarece
-0.59
אשר
-0.59
Porém
-0.58
doch
-0.57
POSITIVE LOGITS
FUCKING
0.97
fucking
0.93
basically
0.90
Basically
0.90
pretty
0.90
goddamn
0.88
REALLY
0.86
pretty
0.85
basically
0.85
Basically
0.83
Activations Density 0.415%