INDEX
Explanations
assertive affirmations or strong opinions
New Auto-Interp
Negative Logits
Lass
-0.59
domésticos
-0.58
تماد
-0.56
NoSuch
-0.56
Baer
-0.55
xampp
-0.51
secas
-0.51
่าว
-0.50
How
-0.50
antig
-0.49
POSITIVE LOGITS
definately
0.92
"]);
0.88
definitely
0.87
Италијани
0.86
انيف
0.84
Definitely
0.84
`,
0.84
Definitely
0.83
"],
0.82
!!</
0.81
Activations Density 0.080%