INDEX
Explanations
expressing comparisons or feelings
New Auto-Interp
Negative Logits
mselves
0.42
stesso
0.30
presumably
0.29
them
0.29
Histor
0.28
sekitar
0.28
subl
0.27
と同様
0.27
នូវ
0.27
as
0.26
POSITIVE LOGITS
they
0.51
we
0.44
overkill
0.42
theres
0.39
there
0.39
theyre
0.39
будто
0.39
youre
0.38
it
0.37
wading
0.35
Activations Density 0.036%