INDEX
Explanations
legal, safe, and ethical boundaries
New Auto-Interp
Negative Logits
खास
0.48
pretty
0.47
impatient
0.46
prettiest
0.43
особенно
0.43
elég
0.42
Easy
0.41
compactness
0.41
baš
0.41
Pretty
0.41
POSITIVE LOGITS
legitimate
1.89
合法
1.69
legít
1.68
legitimately
1.63
lawful
1.59
lawfully
1.58
harmless
1.55
legitt
1.54
safely
1.40
safe
1.38
Activations Density 0.575%