INDEX
Explanations
references to popular science fiction franchises and characters
New Auto-Interp
Negative Logits
Theſe
-0.81
otomatig
-0.80
itſelf
-0.79
myſelf
-0.79
AssemblyTitle
-0.78
."));
-0.76
Monfieur
-0.76
kaynağından
-0.75
Anſ
-0.74
softmax
-0.73
POSITIVE LOGITS
Leia
0.54
Jedi
0.48
kru
0.48
Yoda
0.48
II
0.45
__
0.45
0.44
prises
0.44
"
0.43
Han
0.43
Activations Density 0.429%