INDEX
Explanations
phrases that establish relationships or separations, particularly in a context of oppositional or contrasting ideas
New Auto-Interp
Negative Logits
Efq
-0.85
quæ
-0.85
XK
-0.83
itſelf
-0.80
Laplacian
-0.80
pleaſure
-0.79
purpoſe
-0.78
Majefty
-0.76
becauſe
-0.74
abbildung
-0.74
POSITIVE LOGITS
OutOf
1.08
outta
1.06
the
0.89
INTO
0.87
Into
0.86
into
0.83
Dooley
0.79
off
0.74
Into
0.73
out
0.73
Activations Density 0.034%