INDEX
Explanations
references to small objects or entities
instances of the word "little" or variations related to smallness
New Auto-Interp
Negative Logits
iership
-0.79
ï¸
-0.75
sem
-0.75
idents
-0.74
restling
-0.74
orthy
-0.73
inction
-0.72
ontent
-0.71
worthiness
-0.71
intent
-0.71
POSITIVE LOGITS
girl
1.10
boy
1.06
helper
0.99
sister
0.98
bit
0.98
brother
0.97
guy
0.92
kid
0.89
boys
0.87
girls
0.86
Activations Density 0.032%