INDEX
Explanations
similar or identical words or phrases
instances of usage and similarity in language
New Auto-Interp
Negative Logits
ropolis
-0.70
ursions
-0.64
ablishment
-0.64
lav
-0.64
impending
-0.64
riots
-0.64
irable
-0.63
needing
-0.62
ansion
-0.62
progress
-0.62
POSITIVE LOGITS
technique
1.27
terminology
1.27
pseudonym
1.26
techniques
1.22
tactic
1.17
pronouns
1.09
analogy
1.07
tactics
1.07
euphem
1.04
method
1.04
Activations Density 0.355%