INDEX
Explanations
references to academic achievements and educational pursuits
New Auto-Interp
Negative Logits
aliz
-0.17
Instruction
-0.15
NP
-0.14
instruction
-0.14
rumor
-0.14
Instruction
-0.14
rumors
-0.14
ÑĢел
-0.14
rumored
-0.14
saber
-0.14
POSITIVE LOGITS
Fd
0.25
sand
0.21
Foundation
0.21
sandwich
0.21
foundation
0.21
Pg
0.21
Sandwich
0.20
hon
0.20
Brun
0.19
Pg
0.19
Activations Density 0.061%