INDEX
Explanations
name followed by punctuation
New Auto-Interp
Negative Logits
hello
0.78
Hello
0.70
hello
0.67
Hello
0.63
greeting
0.52
안녕하세요
0.52
пожалуйста
0.52
lütfen
0.51
bonjour
0.50
please
0.50
POSITIVE LOGITS
!(
0.47
}!
0.44
!।
0.44
淖
0.43
![
0.43
.!
0.43
!!.
0.42
!»
0.42
!.
0.41
'!
0.41
Activations Density 0.010%