INDEX
Explanations
specific patterns of characters, potentially related to a particular language or type of text
New Auto-Interp
Negative Logits
Cosponsors
-0.67
adelphia
-0.64
Mehran
-0.63
ratified
-0.61
agall
-0.60
*/(
-0.60
onut
-0.58
willingly
-0.58
lication
-0.58
issance
-0.56
POSITIVE LOGITS
culosis
0.68
Ö
0.64
د
0.60
ibaba
0.60
lithium
0.58
zbek
0.58
¥µ
0.57
VALUE
0.57
iosity
0.55
schild
0.54
Activations Density 13.455%