INDEX
Explanations
interpreting abstract concepts
New Auto-Interp
Negative Logits
различные
1.20
العديد
1.19
számos
1.16
spezielle
1.14
বিভিন্ন
1.13
különböző
1.11
നിരവധി
1.11
特に
1.10
विभिन्न
1.10
hauptsächlich
1.10
POSITIVE LOGITS
intellectually
1.01
somehow
1.00
subconsciously
0.99
unconsciously
0.97
emotionally
0.97
psychologically
0.91
implicitly
0.89
–
0.88
conceptually
0.83
consciously
0.82
Activations Density 0.455%