INDEX
Explanations
statements, commentary, and denials
New Auto-Interp
Negative Logits
সুরক্ষিত
0.48
Там
0.45
Protector
0.45
Healing
0.43
Protected
0.43
بنابراین
0.42
Need
0.42
Aware
0.42
Self
0.42
RAIL
0.42
POSITIVE LOGITS
coment
0.57
komment
0.55
kepada
0.52
comment
0.51
specifics
0.51
koment
0.51
tvrd
0.49
yorum
0.48
komentar
0.48
politischen
0.48
Activations Density 0.014%