INDEX
Explanations
references to user preferences or settings
user preferences
New Auto-Interp
Negative Logits
-0.48
gang
-0.46
<bos>
-0.46
out
-0.46
army
-0.45
ll
-0.44
crimin
-0.44
Sint
-0.44
Johns
-0.42
Helico
-0.42
POSITIVE LOGITS
Preferences
1.67
preferences
1.66
Preferences
1.62
preferences
1.56
Preference
1.35
preference
1.34
Preference
1.28
preference
1.22
preferencias
1.19
preferencia
1.18
Activations Density 0.006%