INDEX
Explanations
quantities and their descriptors
New Auto-Interp
Negative Logits
]
-0.60
betweenstory
-0.57
팎
-0.56
&_
-0.55
=>
-0.52
للغاية
-0.51
yled
-0.50
couverte
-0.50
]--;
-0.50
umumkan
-0.49
POSITIVE LOGITS
people
0.82
times
0.74
folks
0.69
stuff
0.69
things
0.68
ppl
0.66
lotta
0.65
fois
0.64
times
0.60
fjspx
0.59
Activations Density 0.202%