INDEX
Explanations
cryptic words or phrases
words related to various types of food or eating
New Auto-Interp
Negative Logits
Zan
-0.64
Integrity
-0.64
Brill
-0.61
Kinnikuman
-0.59
aspirin
-0.59
deceptive
-0.57
initials
-0.57
illegitimate
-0.57
Barber
-0.57
Lazarus
-0.56
POSITIVE LOGITS
ooth
1.00
ove
0.93
oop
0.92
uff
0.91
ypes
0.90
icz
0.89
ool
0.88
atch
0.88
ood
0.88
ont
0.87
Activations Density 0.405%