INDEX
Explanations
abstract concepts related to learning and education
New Auto-Interp
Negative Logits
adora
-0.18
ador
-0.17
APPER
-0.15
oj
-0.15
442
-0.14
408
-0.14
342
-0.14
aha
-0.13
utor
-0.13
Ank
-0.13
POSITIVE LOGITS
.)↵↵↵↵
0.16
ones
0.15
similarly
0.15
din
0.15
ôm
0.14
åŁĭ
0.14
puties
0.14
Burgess
0.14
.pref
0.14
ÙĩرÙĩ
0.14
Activations Density 0.370%