INDEX
Explanations
proper nouns related to people and places
repeated mentions of the word "rich."
New Auto-Interp
Negative Logits
ŃĶ
-0.80
FY
-0.74
IENCE
-0.71
SCP
-0.67
Carry
-0.63
WAYS
-0.63
Cortex
-0.63
âĹ¼
-0.63
Ĥİ
-0.62
Kinnikuman
-0.61
POSITIVE LOGITS
rich
1.05
mond
0.99
hardt
0.99
ards
0.97
ard
0.93
eton
0.92
ulas
0.87
lings
0.86
fried
0.84
heid
0.84
Activations Density 0.005%