INDEX
Explanations
first-person statements or expressions of personal opinions and experiences
New Auto-Interp
Negative Logits
alot
-0.90
loosing
-0.86
للمعارف
-0.84
thats
-0.84
atleast
-0.81
todays
-0.78
aint
-0.78
thru
-0.77
Lets
-0.76
dont
-0.76
POSITIVE LOGITS
?
0.73
offenbar
0.64
ostensibly
0.64
!—
0.62
presumably
0.62
nominally
0.60
yalnızca
0.59
というわけで
0.59
ſhould
0.58
itſelf
0.58
Activations Density 0.253%