INDEX
Explanations
the presence of curly braces or brackets in the text
New Auto-Interp
Negative Logits
ので
-0.55
-
-0.53
μέ
-0.53
ETRIC
-0.52
BorderRadius
-0.50
yar
-0.48
AT
-0.47
soát
-0.47
bro
-0.46
tat
-0.45
POSITIVE LOGITS
Disliked
0.95
ⓧ
0.92
""}
0.91
cauſe
0.89
reaſon
0.89
purpoſe
0.89
themſelves
0.88
pleaſure
0.87
||}
0.86
myſelf
0.86
Activations Density 0.695%