INDEX
Explanations
expressions of personal beliefs and values regarding responsibility and communication
New Auto-Interp
Negative Logits
Dont
-0.89
doesnt
-0.88
didnt
-0.86
Dont
-0.84
dont
-0.84
fuckin
-0.84
DONT
-0.80
couldnt
-0.79
isnt
-0.79
wouldnt
-0.78
POSITIVE LOGITS
>>
0.69
--
0.63
♪
0.59
kommenden
0.46
"--
0.43
♪
0.41
vergangenen
0.40
ontem
0.40
("--0.39
quehanna
0.38
Activations Density 0.036%