INDEX
Explanations
words or phrases that indicate a ranking or rating
New Auto-Interp
Negative Logits
ãģķãģĦ
-0.16
جا
-0.15
BAB
-0.14
äch
-0.14
hem
-0.14
scal
-0.14
Oak
-0.14
bab
-0.13
upy
-0.13
Ney
-0.13
POSITIVE LOGITS
Persona
0.33
Persona
0.26
persona
0.26
golden
0.25
Golden
0.25
Person
0.24
persona
0.24
Golden
0.22
golden
0.22
Velvet
0.21
Activations Density 0.000%