INDEX
Explanations
proper nouns or specific entities
occurrences of the word "Found" with varying levels of significance
New Auto-Interp
Negative Logits
spoiler
-0.78
schedule
-0.75
override
-0.70
electronically
-0.67
reel
-0.67
ridicule
-0.66
divert
-0.66
withholding
-0.66
tense
-0.65
manual
-0.65
POSITIVE LOGITS
Found
3.71
Found
3.34
found
2.24
Reached
1.52
foundations
1.41
Identified
1.34
Noticed
1.34
Fell
1.30
found
1.26
Bought
1.25
Activations Density 0.012%