INDEX
Explanations
phrases indicating the significance or importance of various subjects or ideas
New Auto-Interp
Negative Logits
anzi
-0.14
eref
-0.13
ắt
-0.13
¶Į
-0.13
mê
-0.13
ế
-0.13
ätz
-0.12
abwe
-0.12
alg
-0.12
аннÑı
-0.12
POSITIVE LOGITS
importance
0.48
need
0.41
necessity
0.35
significance
0.33
value
0.33
Importance
0.33
dangers
0.31
need
0.31
import
0.31
centr
0.30
Activations Density 0.187%