INDEX
Explanations
sentences that discuss the complexity or structure of sentences, particularly regarding their length and grammatical elements
New Auto-Interp
Negative Logits
pleaſure
-0.94
myſelf
-0.91
uſed
-0.89
setVerticalGroup
-0.88
himſelf
-0.87
ſelf
-0.86
themſelves
-0.86
purpoſe
-0.85
houſe
-0.83
Jefus
-0.83
POSITIVE LOGITS
…
0.61
nice
0.60
...
0.59
ça
0.58
b
0.57
:)
0.57
....
0.56
w
0.55
shit
0.55
cool
0.54
Activations Density 0.027%