INDEX
Explanations
often, typically, generally
New Auto-Interp
Negative Logits
۔
0.68
which
0.61
thats
0.61
.
0.61
andre
0.60
zur
0.59
восто
0.57
ہے۔
0.56
victoria
0.56
gato
0.55
POSITIVE LOGITS
inherently
0.80
往往
0.78
সাধারণত
0.71
not
0.68
not
0.68
often
0.64
മാത്രമല്ല
0.63
基本的に
0.62
likely
0.62
intrinsically
0.60
Activations Density 0.009%