INDEX
Explanations
phrases related to personal thoughts, beliefs, and motivations
New Auto-Interp
Negative Logits
Ha
-0.72
displayText
-0.68
latitude
-0.65
Principles
-0.63
VIDEOS
-0.63
utenberg
-0.61
Simulator
-0.59
TPP
-0.58
odore
-0.58
Facts
-0.58
POSITIVE LOGITS
us
0.97
me
0.82
him
0.75
sear
0.69
employers
0.69
haunt
0.67
tered
0.67
ters
0.66
umin
0.66
awed
0.66
Activations Density 0.132%