INDEX
Explanations
phrases related to possession or lack thereof
repeated occurrences of the word "the."
New Auto-Interp
Negative Logits
çīĪ
-0.78
isin
-0.74
ãĥį
-0.71
Layer
-0.69
=#
-0.68
Line
-0.67
instead
-0.65
periodically
-0.64
@@
-0.63
Rex
-0.63
POSITIVE LOGITS
slightest
1.83
usual
1.22
same
1.10
nor
0.99
entirety
0.98
exact
0.95
smallest
0.94
specifics
0.91
hardest
0.89
anymore
0.89
Activations Density 0.340%