INDEX
Explanations
short strings that include the word "work" as a part of a larger word or name
repeated instances of the term 'ork' in various forms
New Auto-Interp
Negative Logits
ples
-0.75
tradem
-0.65
cit
-0.62
vanilla
-0.61
cardio
-0.59
charity
-0.58
behavi
-0.58
ciples
-0.57
EVA
-0.56
cence
-0.56
POSITIVE LOGITS
ansas
1.15
hire
0.94
ozy
0.91
lift
0.89
patrick
0.85
osaurus
0.84
enhagen
0.82
ork
0.81
atana
0.80
owski
0.80
Activations Density 0.004%