INDEX
Explanations
words or phrases related to academic or educational settings
New Auto-Interp
Negative Logits
anke
-0.18
anst
-0.17
rze
-0.15
stype
-0.14
LOPT
-0.14
ipel
-0.14
жи
-0.14
476
-0.14
akis
-0.14
abay
-0.13
POSITIVE LOGITS
gad
0.17
ohl
0.15
fid
0.14
robe
0.14
EQUI
0.14
restless
0.14
agon
0.14
à¤ĵ
0.14
/Instruction
0.14
hee
0.13
Activations Density 0.016%