INDEX
Explanations
phrases that indicate details or distinct characteristics, emphasizing specificity in content
New Auto-Interp
Negative Logits
ony
-0.17
mere
-0.17
Boy
-0.16
ร
-0.15
ish
-0.15
hol
-0.15
oric
-0.15
boys
-0.15
entire
-0.15
iesel
-0.14
POSITIVE LOGITS
ities
0.20
-purpose
0.20
ially
0.19
biá»ĩt
0.18
sayıda
0.16
blr
0.15
ÑĮ
0.15
idades
0.15
ummings
0.15
ulty
0.15
Activations Density 0.037%